Air Travel Consumer Report Period From June 2003 to October 2020
Airport delay statistics (Bureau of Transportation Statistics): Dataset obtained from: Dataset link
Dataset obtained from: Aviation Support Tables
airports.csv describes the locations of US airports, with the fields:
This majority of this data comes from the FAA, but a few extra airports (mainly military bases and US protectorates) were collected from other web sources by Ryan Hafen and Hadley Wickham.
Carrier delay is within the control of the air carrier. Examples of occurrences that may determine carrier delay are: aircraft cleaning, aircraft damage, awaiting the arrival of connecting passengers or crew, baggage, bird strike, cargo loading, catering, computer, outage-carrier equipment, crew legality (pilot or attendant rest), damage by hazardous goods, engineering inspection, fueling, handling disabled passengers, late crew, lavatory servicing, maintenance, oversales, potable water servicing, removal of unruly passenger, slow boarding or seating, stowing carry-on baggage, weight and balance delays.
Arrival delay at an airport due to the late arrival of the same aircraft at a previous airport. The ripple effect of an earlier delay at downstream airports is referred to as delay propagation.
Delay that is within the control of the National Airspace System (NAS) may include: non-extreme weather conditions, airport operations, heavy traffic volume, air traffic control, etc. Delays that occur after Actual Gate Out are usually attributed to the NAS and are also reported through OPSNET.
Security delay is caused by evacuation of a terminal or concourse, re-boarding of aircraft because of security breach, inoperative screening equipment and/or long lines in excess of 29 minutes at screening areas.
Weather Delay Weather delay is caused by extreme or hazardous weather conditions that are forecasted or manifest themselves on point of departure, enroute, or on point of arrival.
Delays to Instrument Flight Rules (IFR) traffic of 15 minutes or more, experienced by individual flights, which result from the ATC system detaining an aircraft at the gate, short of the runway, on the runway, on a taxiway, and/or in a holding configuration anywhere en route.
Such delays include delays due to weather conditions at airports and en route (Weather), FAA and non-FAA equipment malfunctions (Equipment), the volume of traffic at an airport (Volume), reduction to runway capacity (Runway), and other factors (Others). Flight delays of less than 15 minutes are not reported in OPSNET. ASPM reports the most dominant OPSNET delay cause for any flight with an ASQP Reported NAS Delay.
A diverted flight is one that has been routed from its original arrival destination to a new, typically temporary, arrival destination. The leg of the flight that is routed back to the original arrival destination is called the recovery leg.
When you are viewing flight information for a diverted flight, you will see the diversion leg. The recovery leg will be displayed only if available.
# Necessary imports
import os #this module routines for NT or Posix depending on what system we're on.
import pandas as pd # this module
pd.set_option('display.max_columns', None)
import numpy as np # library used for working with arrays. It also has functions for working in domain of linear algebra
import types
import csv
import seaborn as sb # provides a high-level interface for drawing attractive and informative statistical graphics.
import time # This module provides various functions to manipulate time values.
import pandocfilters # This Functions to aid writing python scripts that process the pandoc AST serialized as JSON.
import nbconvert # This module converting notebooks to and from different formats
import pyppeteer # Generate screenshots and PDFs of pages
import pip
print ("libarary-Imported")
libarary-Imported
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
%matplotlib inline
import plotly as px
print ("libarary-Imported")
libarary-Imported
#folium makes it easy to visualize data that’s been manipulated in Python on an interactive leaflet map
import folium
from folium import plugins
import plotly.express as px #Plotly Express is a terse, consistent, high-level API for creating figures
print ("libarary-Imported")
libarary-Imported
#geospatial data visualization library
import geoplot as gplt #geoplot is a geospatial data visualization library designed for data scientists and geospatial analysts
import geopandas as gpd #GeoPandas depends for its spatial functionality (GEOS, GDAL, PROJ).
import geoplot.crs as gcrs
print ("libarary-Imported")
libarary-Imported
#converting notebooks to different format
from IPython.display import HTML
from timeit import default_timer as timer
from pandas.core.tools.datetimes import to_datetime
from pandas.core.tools.timedeltas import to_timedelta
from nbconvert import LatexExporter
from nbconvert import PDFExporter
from nbconvert import webpdf
from nbconvert import nbconvertapp
from nbconvert import templates
print ("functions-Imported")
functions-Imported
#Determine current working directory
import os
os.getcwd()
'C:\\Users\\Abdelrazek\\PycharmProjects\\Flight'
# Read the csv file, and check its top 5 rows airports data frame
airports = pd.read_csv('airports_Master_Coordinate.csv')
print(airports.shape)
airports.head()
(18109, 10)
| AIRPORT_ID | AIRPORT | DISPLAY_AIRPORT_NAME | DISPLAY_AIRPORT_CITY_NAME_FULL | AIRPORT_COUNTRY_NAME | AIRPORT_STATE_NAME | DISPLAY_CITY_MARKET_NAME_FULL | LATITUDE | LONGITUDE | Unnamed: 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10001 | 01A | Afognak Lake Airport | Afognak Lake, AK | United States | Alaska | Afognak Lake, AK | 58.109444 | -152.906667 | NaN |
| 1 | 10003 | 03A | Bear Creek Mining Strip | Granite Mountain, AK | United States | Alaska | Granite Mountain, AK | 65.548056 | -161.071667 | NaN |
| 2 | 10004 | 04A | Lik Mining Camp | Lik, AK | United States | Alaska | Lik, AK | 68.083333 | -163.166667 | NaN |
| 3 | 10005 | 05A | Little Squaw Airport | Little Squaw, AK | United States | Alaska | Little Squaw, AK | 67.570000 | -148.183889 | NaN |
| 4 | 10006 | 06A | Kizhuyak Bay | Kizhuyak, AK | United States | Alaska | Kizhuyak, AK | 57.745278 | -152.882778 | NaN |
#check COUNTRY_NAME
airports['AIRPORT_COUNTRY_NAME'].values
array(['United States', 'United States', 'United States', ..., 'Somalia',
'South Sudan', 'United States'], dtype=object)
#filter airports data frame to get usa_airports
usa_airports=airports.loc[airports['AIRPORT_COUNTRY_NAME'] =='United States']
print(usa_airports.shape)
usa_airports.head()
(6931, 10)
| AIRPORT_ID | AIRPORT | DISPLAY_AIRPORT_NAME | DISPLAY_AIRPORT_CITY_NAME_FULL | AIRPORT_COUNTRY_NAME | AIRPORT_STATE_NAME | DISPLAY_CITY_MARKET_NAME_FULL | LATITUDE | LONGITUDE | Unnamed: 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10001 | 01A | Afognak Lake Airport | Afognak Lake, AK | United States | Alaska | Afognak Lake, AK | 58.109444 | -152.906667 | NaN |
| 1 | 10003 | 03A | Bear Creek Mining Strip | Granite Mountain, AK | United States | Alaska | Granite Mountain, AK | 65.548056 | -161.071667 | NaN |
| 2 | 10004 | 04A | Lik Mining Camp | Lik, AK | United States | Alaska | Lik, AK | 68.083333 | -163.166667 | NaN |
| 3 | 10005 | 05A | Little Squaw Airport | Little Squaw, AK | United States | Alaska | Little Squaw, AK | 67.570000 | -148.183889 | NaN |
| 4 | 10006 | 06A | Kizhuyak Bay | Kizhuyak, AK | United States | Alaska | Kizhuyak, AK | 57.745278 | -152.882778 | NaN |
#create usa_airports csv file
usa_airports.to_csv (r'C:\Users\Abdelrazek\PycharmProjects\Flight\usa_airports.csv', index = False, header=True)
usa_airports_df = pd.read_csv('usa_airports.csv')
usa_airports_df.head(2)
| AIRPORT_ID | AIRPORT | DISPLAY_AIRPORT_NAME | DISPLAY_AIRPORT_CITY_NAME_FULL | AIRPORT_COUNTRY_NAME | AIRPORT_STATE_NAME | DISPLAY_CITY_MARKET_NAME_FULL | LATITUDE | LONGITUDE | Unnamed: 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10001 | 01A | Afognak Lake Airport | Afognak Lake, AK | United States | Alaska | Afognak Lake, AK | 58.109444 | -152.906667 | NaN |
| 1 | 10003 | 03A | Bear Creek Mining Strip | Granite Mountain, AK | United States | Alaska | Granite Mountain, AK | 65.548056 | -161.071667 | NaN |
#drop columan without value
usa_airports_df_1=usa_airports_df.drop(columns=['Unnamed: 9'])
usa_airports_df_1.head(2)
| AIRPORT_ID | AIRPORT | DISPLAY_AIRPORT_NAME | DISPLAY_AIRPORT_CITY_NAME_FULL | AIRPORT_COUNTRY_NAME | AIRPORT_STATE_NAME | DISPLAY_CITY_MARKET_NAME_FULL | LATITUDE | LONGITUDE | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 10001 | 01A | Afognak Lake Airport | Afognak Lake, AK | United States | Alaska | Afognak Lake, AK | 58.109444 | -152.906667 |
| 1 | 10003 | 03A | Bear Creek Mining Strip | Granite Mountain, AK | United States | Alaska | Granite Mountain, AK | 65.548056 | -161.071667 |
#Explore what the histogram of the data looks like
usa_airports_df_1.hist(figsize=(10,10));
# display a summary of the dataframe
usa_airports_df_1.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 6931 entries, 0 to 6930 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 AIRPORT_ID 6931 non-null int64 1 AIRPORT 6931 non-null object 2 DISPLAY_AIRPORT_NAME 6931 non-null object 3 DISPLAY_AIRPORT_CITY_NAME_FULL 6931 non-null object 4 AIRPORT_COUNTRY_NAME 6931 non-null object 5 AIRPORT_STATE_NAME 6931 non-null object 6 DISPLAY_CITY_MARKET_NAME_FULL 6931 non-null object 7 LATITUDE 6930 non-null float64 8 LONGITUDE 6930 non-null float64 dtypes: float64(2), int64(1), object(6) memory usage: 487.5+ KB
# Read the csv file, and check its top 2 rows
airline_delay_causes = pd.read_csv('airline_delay_causes.csv')
print(airline_delay_causes.shape)
airline_delay_causes.head(2)
(289637, 22)
| year | month | carrier | carrier_name | airport | airport_name | arr_flights | arr_del15 | carrier_ct | weather_ct | nas_ct | security_ct | late_aircraft_ct | arr_cancelled | arr_diverted | arr_delay | carrier_delay | weather_delay | nas_delay | security_delay | late_aircraft_delay | Unnamed: 21 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2004 | 1 | DL | Delta Air Lines Inc. | PBI | West Palm Beach/Palm Beach, FL: Palm Beach Int... | 650.0 | 126.0 | 21.06 | 6.44 | 51.58 | 1.0 | 45.92 | 4.0 | 0.0 | 5425.0 | 881.0 | 397.0 | 2016.0 | 15.0 | 2116.0 | NaN |
| 1 | 2004 | 1 | DL | Delta Air Lines Inc. | PDX | Portland, OR: Portland International | 314.0 | 61.0 | 14.09 | 2.61 | 34.25 | 0.0 | 10.05 | 30.0 | 3.0 | 2801.0 | 478.0 | 239.0 | 1365.0 | 0.0 | 719.0 | NaN |
# display a summary of the dataframe
airline_delay_causes.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 289637 entries, 0 to 289636 Data columns (total 22 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year 289637 non-null int64 1 month 289637 non-null int64 2 carrier 289637 non-null object 3 carrier_name 289637 non-null object 4 airport 289637 non-null object 5 airport_name 289637 non-null object 6 arr_flights 289193 non-null float64 7 arr_del15 288968 non-null float64 8 carrier_ct 289193 non-null float64 9 weather_ct 289193 non-null float64 10 nas_ct 289193 non-null float64 11 security_ct 289193 non-null float64 12 late_aircraft_ct 289193 non-null float64 13 arr_cancelled 289193 non-null float64 14 arr_diverted 289193 non-null float64 15 arr_delay 289193 non-null float64 16 carrier_delay 289193 non-null float64 17 weather_delay 289193 non-null float64 18 nas_delay 289193 non-null float64 19 security_delay 289193 non-null float64 20 late_aircraft_delay 289193 non-null float64 21 Unnamed: 21 0 non-null float64 dtypes: float64(16), int64(2), object(4) memory usage: 48.6+ MB
#rename columns strat with space
airline_delay_causes.rename(columns={'airport' : 'AIRPORT',' month' : 'month',' weather_ct': 'weather_ct',' arr_delay' : 'arr_delay',' carrier_delay' : 'carrier_delay'}, inplace=True)
# display a summary of the dataframe
airline_delay_causes.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 289637 entries, 0 to 289636 Data columns (total 22 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year 289637 non-null int64 1 month 289637 non-null int64 2 carrier 289637 non-null object 3 carrier_name 289637 non-null object 4 AIRPORT 289637 non-null object 5 airport_name 289637 non-null object 6 arr_flights 289193 non-null float64 7 arr_del15 288968 non-null float64 8 carrier_ct 289193 non-null float64 9 weather_ct 289193 non-null float64 10 nas_ct 289193 non-null float64 11 security_ct 289193 non-null float64 12 late_aircraft_ct 289193 non-null float64 13 arr_cancelled 289193 non-null float64 14 arr_diverted 289193 non-null float64 15 arr_delay 289193 non-null float64 16 carrier_delay 289193 non-null float64 17 weather_delay 289193 non-null float64 18 nas_delay 289193 non-null float64 19 security_delay 289193 non-null float64 20 late_aircraft_delay 289193 non-null float64 21 Unnamed: 21 0 non-null float64 dtypes: float64(16), int64(2), object(4) memory usage: 48.6+ MB
#get all coulman names
airline_delay_causes.columns
Index(['year', 'month', 'carrier', 'carrier_name', 'AIRPORT', 'airport_name',
'arr_flights', 'arr_del15', 'carrier_ct', 'weather_ct', 'nas_ct',
'security_ct', 'late_aircraft_ct', 'arr_cancelled', 'arr_diverted',
'arr_delay', 'carrier_delay', 'weather_delay', 'nas_delay',
'security_delay', 'late_aircraft_delay', 'Unnamed: 21'],
dtype='object')
# drop duplicates of the dataframe
print(airline_delay_causes.duplicated().sum)
<bound method NDFrame._add_numeric_operations.<locals>.sum of 0 False
1 False
2 False
3 False
4 False
...
289632 False
289633 False
289634 False
289635 False
289636 False
Length: 289637, dtype: bool>
#drop columan without value
df_airline_delay_causes=airline_delay_causes.drop(columns=['Unnamed: 21'])
df_airline_delay_causes.head(2)
| year | month | carrier | carrier_name | AIRPORT | airport_name | arr_flights | arr_del15 | carrier_ct | weather_ct | nas_ct | security_ct | late_aircraft_ct | arr_cancelled | arr_diverted | arr_delay | carrier_delay | weather_delay | nas_delay | security_delay | late_aircraft_delay | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2004 | 1 | DL | Delta Air Lines Inc. | PBI | West Palm Beach/Palm Beach, FL: Palm Beach Int... | 650.0 | 126.0 | 21.06 | 6.44 | 51.58 | 1.0 | 45.92 | 4.0 | 0.0 | 5425.0 | 881.0 | 397.0 | 2016.0 | 15.0 | 2116.0 |
| 1 | 2004 | 1 | DL | Delta Air Lines Inc. | PDX | Portland, OR: Portland International | 314.0 | 61.0 | 14.09 | 2.61 | 34.25 | 0.0 | 10.05 | 30.0 | 3.0 | 2801.0 | 478.0 | 239.0 | 1365.0 | 0.0 | 719.0 |
# display a static summary of the dataframe
df_airline_delay_causes.describe()
| year | month | arr_flights | arr_del15 | carrier_ct | weather_ct | nas_ct | security_ct | late_aircraft_ct | arr_cancelled | arr_diverted | arr_delay | carrier_delay | weather_delay | nas_delay | security_delay | late_aircraft_delay | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 289637.000000 | 289637.000000 | 289193.000000 | 288968.000000 | 289193.000000 | 289193.000000 | 289193.000000 | 289193.000000 | 289193.000000 | 289193.000000 | 289193.000000 | 289193.000000 | 289193.000000 | 289193.000000 | 289193.000000 | 289193.000000 | 289193.000000 |
| mean | 2011.562369 | 6.516215 | 389.580585 | 75.004270 | 21.097864 | 2.675956 | 25.127567 | 0.173963 | 25.870624 | 7.331284 | 0.887660 | 4293.218484 | 1272.710221 | 221.832617 | 1152.813906 | 6.881021 | 1638.978288 |
| std | 5.161139 | 3.426717 | 1042.744693 | 203.225578 | 46.928801 | 10.173334 | 88.057001 | 0.829669 | 77.044038 | 38.028441 | 3.989153 | 12701.694767 | 3410.400202 | 861.617050 | 4809.587421 | 36.899165 | 5004.832854 |
| min | 2003.000000 | 1.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | -0.010000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | -19.000000 | 0.000000 | 0.000000 |
| 25% | 2007.000000 | 4.000000 | 60.000000 | 10.000000 | 3.230000 | 0.000000 | 1.950000 | 0.000000 | 1.820000 | 0.000000 | 0.000000 | 464.000000 | 157.000000 | 0.000000 | 63.000000 | 0.000000 | 88.000000 |
| 50% | 2011.000000 | 7.000000 | 124.000000 | 24.000000 | 8.530000 | 0.620000 | 5.920000 | 0.000000 | 6.250000 | 1.000000 | 0.000000 | 1254.000000 | 450.000000 | 27.000000 | 218.000000 | 0.000000 | 374.000000 |
| 75% | 2016.000000 | 9.000000 | 279.000000 | 58.000000 | 20.000000 | 2.080000 | 16.120000 | 0.000000 | 17.810000 | 4.000000 | 1.000000 | 3158.000000 | 1107.000000 | 163.000000 | 632.000000 | 0.000000 | 1153.000000 |
| max | 2020.000000 | 12.000000 | 21977.000000 | 6377.000000 | 1792.070000 | 717.940000 | 4091.270000 | 80.560000 | 1885.470000 | 4951.000000 | 256.000000 | 433687.000000 | 196944.000000 | 57707.000000 | 238440.000000 | 3194.000000 | 148181.000000 |
#Calculating the average for late_aircraft_dela
avg_late_aircraft_delay=df_airline_delay_causes.late_aircraft_delay.mean()
avg_late_aircraft_delay
1638.9782878562069
# index
df_airline_delay_causes.set_index(['year', 'month', 'carrier', 'carrier_name', 'AIRPORT', 'airport_name',
'arr_flights', 'arr_del15', 'carrier_ct', 'weather_ct', 'nas_ct',
'security_ct', 'late_aircraft_ct', 'arr_cancelled', 'arr_diverted',
'arr_delay', 'carrier_delay', 'weather_delay', 'nas_delay',
'security_delay', 'late_aircraft_delay'])
df_airline_delay_causes.index
RangeIndex(start=0, stop=289637, step=1)
#Calculating Difference from Average Delay
#add a new column in the dataframe name 'def_avg_Delay'
df_airline_delay_causes.reset_index
df_avg_delay_causes = df_airline_delay_causes.copy()
df_avg_delay_causes
df_avg_delay_causes['def_avg_Delay'] = df_avg_delay_causes['late_aircraft_delay'] -avg_late_aircraft_delay
df_avg_delay_causes.head(2)
| year | month | carrier | carrier_name | AIRPORT | airport_name | arr_flights | arr_del15 | carrier_ct | weather_ct | nas_ct | security_ct | late_aircraft_ct | arr_cancelled | arr_diverted | arr_delay | carrier_delay | weather_delay | nas_delay | security_delay | late_aircraft_delay | def_avg_Delay | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2004 | 1 | DL | Delta Air Lines Inc. | PBI | West Palm Beach/Palm Beach, FL: Palm Beach Int... | 650.0 | 126.0 | 21.06 | 6.44 | 51.58 | 1.0 | 45.92 | 4.0 | 0.0 | 5425.0 | 881.0 | 397.0 | 2016.0 | 15.0 | 2116.0 | 477.021712 |
| 1 | 2004 | 1 | DL | Delta Air Lines Inc. | PDX | Portland, OR: Portland International | 314.0 | 61.0 | 14.09 | 2.61 | 34.25 | 0.0 | 10.05 | 30.0 | 3.0 | 2801.0 | 478.0 | 239.0 | 1365.0 | 0.0 | 719.0 | -919.978288 |
#Explore what the histogram of the data looks like
df_avg_delay_causes.hist(figsize=(25,25));
# display a sum of zero value in the dataframe for each columan
# we will drop zero value as we will level up report data in additional steps
print((df_avg_delay_causes == 0).sum())
year 0 month 0 carrier 0 carrier_name 0 AIRPORT 0 airport_name 0 arr_flights 0 arr_del15 7595 carrier_ct 14808 weather_ct 112636 nas_ct 23600 security_ct 251295 late_aircraft_ct 33045 arr_cancelled 115905 arr_diverted 207682 arr_delay 7820 carrier_delay 14806 weather_delay 112614 nas_delay 23575 security_delay 251287 late_aircraft_delay 33040 def_avg_Delay 0 dtype: int64
#level up and calculate the Delay in (min/flight) for each airport and carrier
#add a new column in the dataframe name 'aircraft_delay(min/flight)'
df_avg_delay_causes_1 = df_avg_delay_causes.copy()
df_avg_delay_causes_1
df_avg_delay_causes_1['aircraft_delay(min/flight)'] = df_avg_delay_causes_1['late_aircraft_delay'] /df_avg_delay_causes_1['late_aircraft_ct']
df_avg_delay_causes_1.drop(['late_aircraft_ct'],axis = 1, inplace = True)
df_avg_delay_causes_1.drop(['late_aircraft_delay'],axis = 1, inplace = True)
print(df_avg_delay_causes_1.shape)
df_avg_delay_causes_1.head(2)
(289637, 21)
| year | month | carrier | carrier_name | AIRPORT | airport_name | arr_flights | arr_del15 | carrier_ct | weather_ct | nas_ct | security_ct | arr_cancelled | arr_diverted | arr_delay | carrier_delay | weather_delay | nas_delay | security_delay | def_avg_Delay | aircraft_delay(min/flight) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2004 | 1 | DL | Delta Air Lines Inc. | PBI | West Palm Beach/Palm Beach, FL: Palm Beach Int... | 650.0 | 126.0 | 21.06 | 6.44 | 51.58 | 1.0 | 4.0 | 0.0 | 5425.0 | 881.0 | 397.0 | 2016.0 | 15.0 | 477.021712 | 46.080139 |
| 1 | 2004 | 1 | DL | Delta Air Lines Inc. | PDX | Portland, OR: Portland International | 314.0 | 61.0 | 14.09 | 2.61 | 34.25 | 0.0 | 30.0 | 3.0 | 2801.0 | 478.0 | 239.0 | 1365.0 | 0.0 | -919.978288 | 71.542289 |
# display a static summary of the dataframe
df_avg_delay_causes_1.describe()
| year | month | arr_flights | arr_del15 | carrier_ct | weather_ct | nas_ct | security_ct | arr_cancelled | arr_diverted | arr_delay | carrier_delay | weather_delay | nas_delay | security_delay | def_avg_Delay | aircraft_delay(min/flight) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 289637.000000 | 289637.000000 | 289193.000000 | 288968.000000 | 289193.000000 | 289193.000000 | 289193.000000 | 289193.000000 | 289193.000000 | 289193.000000 | 289193.000000 | 289193.000000 | 289193.000000 | 289193.000000 | 289193.000000 | 2.891930e+05 | 2.561530e+05 |
| mean | 2011.562369 | 6.516215 | 389.580585 | 75.004270 | 21.097864 | 2.675956 | 25.127567 | 0.173963 | 7.331284 | 0.887660 | 4293.218484 | 1272.710221 | 221.832617 | 1152.813906 | 6.881021 | -2.912466e-13 | inf |
| std | 5.161139 | 3.426717 | 1042.744693 | 203.225578 | 46.928801 | 10.173334 | 88.057001 | 0.829669 | 38.028441 | 3.989153 | 12701.694767 | 3410.400202 | 861.617050 | 4809.587421 | 36.899165 | 5.004833e+03 | NaN |
| min | 2003.000000 | 1.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | -0.010000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | -19.000000 | 0.000000 | -1.638978e+03 | 1.428571e+01 |
| 25% | 2007.000000 | 4.000000 | 60.000000 | 10.000000 | 3.230000 | 0.000000 | 1.950000 | 0.000000 | 0.000000 | 0.000000 | 464.000000 | 157.000000 | 0.000000 | 63.000000 | 0.000000 | -1.550978e+03 | 4.505495e+01 |
| 50% | 2011.000000 | 7.000000 | 124.000000 | 24.000000 | 8.530000 | 0.620000 | 5.920000 | 0.000000 | 1.000000 | 0.000000 | 1254.000000 | 450.000000 | 27.000000 | 218.000000 | 0.000000 | -1.264978e+03 | 5.864979e+01 |
| 75% | 2016.000000 | 9.000000 | 279.000000 | 58.000000 | 20.000000 | 2.080000 | 16.120000 | 0.000000 | 4.000000 | 1.000000 | 3158.000000 | 1107.000000 | 163.000000 | 632.000000 | 0.000000 | -4.859783e+02 | 7.396450e+01 |
| max | 2020.000000 | 12.000000 | 21977.000000 | 6377.000000 | 1792.070000 | 717.940000 | 4091.270000 | 80.560000 | 4951.000000 | 256.000000 | 433687.000000 | 196944.000000 | 57707.000000 | 238440.000000 | 3194.000000 | 1.465420e+05 | inf |
#replace zero values with nan
df_avg_delay_causes.copy()
df_avg_delay_causes_2 = df_avg_delay_causes.replace(0, np.nan)
df_avg_delay_causes_2
| year | month | carrier | carrier_name | AIRPORT | airport_name | arr_flights | arr_del15 | carrier_ct | weather_ct | nas_ct | security_ct | late_aircraft_ct | arr_cancelled | arr_diverted | arr_delay | carrier_delay | weather_delay | nas_delay | security_delay | late_aircraft_delay | def_avg_Delay | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2004 | 1 | DL | Delta Air Lines Inc. | PBI | West Palm Beach/Palm Beach, FL: Palm Beach Int... | 650.0 | 126.0 | 21.06 | 6.44 | 51.58 | 1.0 | 45.92 | 4.0 | NaN | 5425.0 | 881.0 | 397.0 | 2016.0 | 15.0 | 2116.0 | 477.021712 |
| 1 | 2004 | 1 | DL | Delta Air Lines Inc. | PDX | Portland, OR: Portland International | 314.0 | 61.0 | 14.09 | 2.61 | 34.25 | NaN | 10.05 | 30.0 | 3.0 | 2801.0 | 478.0 | 239.0 | 1365.0 | NaN | 719.0 | -919.978288 |
| 2 | 2004 | 1 | DL | Delta Air Lines Inc. | PHL | Philadelphia, PA: Philadelphia International | 513.0 | 97.0 | 27.60 | 0.42 | 51.86 | NaN | 17.12 | 15.0 | NaN | 4261.0 | 1150.0 | 16.0 | 2286.0 | NaN | 809.0 | -829.978288 |
| 3 | 2004 | 1 | DL | Delta Air Lines Inc. | PHX | Phoenix, AZ: Phoenix Sky Harbor International | 334.0 | 78.0 | 20.14 | 2.02 | 39.39 | NaN | 16.45 | 3.0 | 1.0 | 3400.0 | 1159.0 | 166.0 | 1295.0 | NaN | 780.0 | -858.978288 |
| 4 | 2004 | 1 | DL | Delta Air Lines Inc. | PIT | Pittsburgh, PA: Pittsburgh International | 217.0 | 47.0 | 8.08 | 0.44 | 21.89 | NaN | 16.59 | 4.0 | 1.0 | 1737.0 | 350.0 | 28.0 | 522.0 | NaN | 837.0 | -801.978288 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 289632 | 2019 | 1 | MQ | Envoy Air | RIC | Richmond, VA: Richmond International | 195.0 | 68.0 | 12.12 | 1.87 | 17.97 | NaN | 36.04 | 22.0 | NaN | 4138.0 | 603.0 | 196.0 | 780.0 | NaN | 2559.0 | 920.021712 |
| 289633 | 2019 | 1 | MQ | Envoy Air | ROA | Roanoke, VA: Roanoke Blacksburg Regional Woodr... | 52.0 | 14.0 | 2.74 | 0.69 | 2.46 | NaN | 8.11 | 2.0 | 1.0 | 726.0 | 323.0 | 31.0 | 112.0 | NaN | 260.0 | -1378.978288 |
| 289634 | 2019 | 1 | MQ | Envoy Air | ROC | Rochester, NY: Greater Rochester International | 106.0 | 26.0 | 4.67 | 2.26 | 11.81 | NaN | 7.26 | 17.0 | NaN | 1259.0 | 313.0 | 99.0 | 484.0 | NaN | 363.0 | -1275.978288 |
| 289635 | 2019 | 1 | MQ | Envoy Air | RST | Rochester, MN: Rochester International | 116.0 | 35.0 | 6.83 | 6.92 | 11.50 | NaN | 9.75 | 22.0 | NaN | 2307.0 | 474.0 | 633.0 | 523.0 | NaN | 677.0 | -961.978288 |
| 289636 | 2019 | 1 | MQ | Envoy Air | SAT | San Antonio, TX: San Antonio International | 26.0 | 4.0 | 1.16 | 0.64 | 1.92 | NaN | 0.29 | 5.0 | NaN | 120.0 | 50.0 | 14.0 | 41.0 | NaN | 15.0 | -1623.978288 |
289637 rows × 22 columns
# display a sum of zero value in the dataframe for each columan
# we will drop zero value as we will level up report data in additional steps
print((df_avg_delay_causes_2 == 0).sum())
year 0 month 0 carrier 0 carrier_name 0 AIRPORT 0 airport_name 0 arr_flights 0 arr_del15 0 carrier_ct 0 weather_ct 0 nas_ct 0 security_ct 0 late_aircraft_ct 0 arr_cancelled 0 arr_diverted 0 arr_delay 0 carrier_delay 0 weather_delay 0 nas_delay 0 security_delay 0 late_aircraft_delay 0 def_avg_Delay 0 dtype: int64
# display a summary of the dataframe info
df_avg_delay_causes_2.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 289637 entries, 0 to 289636 Data columns (total 22 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year 289637 non-null int64 1 month 289637 non-null int64 2 carrier 289637 non-null object 3 carrier_name 289637 non-null object 4 AIRPORT 289637 non-null object 5 airport_name 289637 non-null object 6 arr_flights 289193 non-null float64 7 arr_del15 281373 non-null float64 8 carrier_ct 274385 non-null float64 9 weather_ct 176557 non-null float64 10 nas_ct 265593 non-null float64 11 security_ct 37898 non-null float64 12 late_aircraft_ct 256148 non-null float64 13 arr_cancelled 173288 non-null float64 14 arr_diverted 81511 non-null float64 15 arr_delay 281373 non-null float64 16 carrier_delay 274387 non-null float64 17 weather_delay 176579 non-null float64 18 nas_delay 265618 non-null float64 19 security_delay 37906 non-null float64 20 late_aircraft_delay 256153 non-null float64 21 def_avg_Delay 289193 non-null float64 dtypes: float64(16), int64(2), object(4) memory usage: 48.6+ MB
#drop NaN value
df_avg_delay_causes_3=df_avg_delay_causes_2.dropna()
delay_causes_df=df_avg_delay_causes_3.copy()
delay_causes_df.head(2)
| year | month | carrier | carrier_name | AIRPORT | airport_name | arr_flights | arr_del15 | carrier_ct | weather_ct | nas_ct | security_ct | late_aircraft_ct | arr_cancelled | arr_diverted | arr_delay | carrier_delay | weather_delay | nas_delay | security_delay | late_aircraft_delay | def_avg_Delay | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 21 | 2004 | 1 | DL | Delta Air Lines Inc. | SLC | Salt Lake City, UT: Salt Lake City International | 3012.0 | 756.0 | 163.51 | 33.04 | 427.21 | 1.60 | 130.64 | 38.0 | 1.0 | 31310.0 | 8466.0 | 2516.0 | 13218.0 | 100.0 | 7010.0 | 5371.021712 |
| 60 | 2004 | 1 | EV | Atlantic Southeast Airlines | DFW | Dallas/Fort Worth, TX: Dallas/Fort Worth Inter... | 4322.0 | 582.0 | 152.53 | 91.84 | 270.74 | 4.89 | 62.00 | 128.0 | 26.0 | 31531.0 | 10027.0 | 7696.0 | 10580.0 | 134.0 | 3094.0 | 1455.021712 |
#create usa_airports csv file ,Read the csv file, and check its top 2 rows
delay_causes_df.to_csv (r'C:\Users\Abdelrazek\PycharmProjects\Flight\delay_causes.csv', index = False, header=True)
df_delay_causes= pd.read_csv('delay_causes.csv')
df_delay_causes.head(2)
| year | month | carrier | carrier_name | AIRPORT | airport_name | arr_flights | arr_del15 | carrier_ct | weather_ct | nas_ct | security_ct | late_aircraft_ct | arr_cancelled | arr_diverted | arr_delay | carrier_delay | weather_delay | nas_delay | security_delay | late_aircraft_delay | def_avg_Delay | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2004 | 1 | DL | Delta Air Lines Inc. | SLC | Salt Lake City, UT: Salt Lake City International | 3012.0 | 756.0 | 163.51 | 33.04 | 427.21 | 1.60 | 130.64 | 38.0 | 1.0 | 31310.0 | 8466.0 | 2516.0 | 13218.0 | 100.0 | 7010.0 | 5371.021712 |
| 1 | 2004 | 1 | EV | Atlantic Southeast Airlines | DFW | Dallas/Fort Worth, TX: Dallas/Fort Worth Inter... | 4322.0 | 582.0 | 152.53 | 91.84 | 270.74 | 4.89 | 62.00 | 128.0 | 26.0 | 31531.0 | 10027.0 | 7696.0 | 10580.0 | 134.0 | 3094.0 | 1455.021712 |
# display a static summary of the dataframe
df_delay_causes.describe()
| year | month | arr_flights | arr_del15 | carrier_ct | weather_ct | nas_ct | security_ct | late_aircraft_ct | arr_cancelled | arr_diverted | arr_delay | carrier_delay | weather_delay | nas_delay | security_delay | late_aircraft_delay | def_avg_Delay | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 16903.000000 | 16903.000000 | 16903.000000 | 16903.000000 | 16903.000000 | 16903.000000 | 16903.000000 | 16903.000000 | 16903.000000 | 16903.000000 | 16903.000000 | 16903.000000 | 16903.000000 | 16903.000000 | 16903.000000 | 16903.000000 | 16903.000000 | 16903.000000 |
| mean | 2011.244217 | 6.414542 | 2609.453233 | 505.065965 | 119.759147 | 17.025426 | 177.586204 | 1.809004 | 188.886271 | 48.069869 | 6.970064 | 29411.084068 | 7809.031947 | 1447.946992 | 8369.994143 | 72.017512 | 11712.063362 | 10073.085074 |
| std | 4.981090 | 3.435825 | 2844.515359 | 568.002107 | 127.641867 | 34.618548 | 264.992637 | 2.496775 | 218.250819 | 104.642630 | 13.054701 | 36489.734227 | 10024.313609 | 2855.229331 | 14711.519261 | 115.603377 | 14286.426579 | 14286.426579 |
| min | 2003.000000 | 1.000000 | 21.000000 | 4.000000 | 0.270000 | 0.010000 | 0.170000 | 0.010000 | 0.020000 | 1.000000 | 1.000000 | 108.000000 | 21.000000 | 1.000000 | 5.000000 | 1.000000 | 1.000000 | -1637.978288 |
| 25% | 2007.000000 | 3.000000 | 564.000000 | 127.000000 | 37.000000 | 2.790000 | 31.320000 | 0.570000 | 36.750000 | 5.000000 | 1.000000 | 6848.500000 | 2013.500000 | 190.000000 | 1203.000000 | 16.000000 | 2351.500000 | 712.521712 |
| 50% | 2011.000000 | 6.000000 | 1575.000000 | 304.000000 | 79.590000 | 6.890000 | 80.660000 | 1.000000 | 108.820000 | 16.000000 | 3.000000 | 16449.000000 | 4480.000000 | 537.000000 | 3201.000000 | 36.000000 | 6656.000000 | 5017.021712 |
| 75% | 2016.000000 | 9.000000 | 3819.500000 | 690.000000 | 161.000000 | 17.565000 | 212.920000 | 2.090000 | 263.750000 | 49.000000 | 7.000000 | 37734.000000 | 9909.000000 | 1451.500000 | 8925.000000 | 85.000000 | 15778.000000 | 14139.021712 |
| max | 2020.000000 | 12.000000 | 21977.000000 | 6377.000000 | 1792.070000 | 717.940000 | 4091.270000 | 80.560000 | 1885.470000 | 3154.000000 | 256.000000 | 433687.000000 | 196944.000000 | 57707.000000 | 238440.000000 | 3194.000000 | 148181.000000 | 146542.021712 |
#level up and calculate the Delay in (min/flight) for each airport and carrier
#add a new column in the dataframe name 'aircraft_delay(min/flight)'
df_delay_causes
df_delay_causes['aircraft_delay(min/flight)'] = df_delay_causes['late_aircraft_delay'] /df_delay_causes['late_aircraft_ct']/df_delay_causes['arr_flights']
df_delay_causes.drop(['late_aircraft_ct'],axis = 1, inplace = True)
df_delay_causes.drop(['late_aircraft_delay'],axis = 1, inplace = True)
print(df_delay_causes.shape)
df_delay_causes.head(2)
(16903, 21)
| year | month | carrier | carrier_name | AIRPORT | airport_name | arr_flights | arr_del15 | carrier_ct | weather_ct | nas_ct | security_ct | arr_cancelled | arr_diverted | arr_delay | carrier_delay | weather_delay | nas_delay | security_delay | def_avg_Delay | aircraft_delay(min/flight) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2004 | 1 | DL | Delta Air Lines Inc. | SLC | Salt Lake City, UT: Salt Lake City International | 3012.0 | 756.0 | 163.51 | 33.04 | 427.21 | 1.60 | 38.0 | 1.0 | 31310.0 | 8466.0 | 2516.0 | 13218.0 | 100.0 | 5371.021712 | 0.017815 |
| 1 | 2004 | 1 | EV | Atlantic Southeast Airlines | DFW | Dallas/Fort Worth, TX: Dallas/Fort Worth Inter... | 4322.0 | 582.0 | 152.53 | 91.84 | 270.74 | 4.89 | 128.0 | 26.0 | 31531.0 | 10027.0 | 7696.0 | 10580.0 | 134.0 | 1455.021712 | 0.011546 |
#summary of the dataframe
df_delay_causes.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 16903 entries, 0 to 16902 Data columns (total 21 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year 16903 non-null int64 1 month 16903 non-null int64 2 carrier 16903 non-null object 3 carrier_name 16903 non-null object 4 AIRPORT 16903 non-null object 5 airport_name 16903 non-null object 6 arr_flights 16903 non-null float64 7 arr_del15 16903 non-null float64 8 carrier_ct 16903 non-null float64 9 weather_ct 16903 non-null float64 10 nas_ct 16903 non-null float64 11 security_ct 16903 non-null float64 12 arr_cancelled 16903 non-null float64 13 arr_diverted 16903 non-null float64 14 arr_delay 16903 non-null float64 15 carrier_delay 16903 non-null float64 16 weather_delay 16903 non-null float64 17 nas_delay 16903 non-null float64 18 security_delay 16903 non-null float64 19 def_avg_Delay 16903 non-null float64 20 aircraft_delay(min/flight) 16903 non-null float64 dtypes: float64(15), int64(2), object(4) memory usage: 2.7+ MB
#level up and calculate the Delay in (min/flight) for each airport and carrier
#add a new column in the dataframe name 'nas_delay(min/flight)'
df_nas_delay_causes = df_delay_causes.copy()
df_nas_delay_causes
df_nas_delay_causes['nas_delay(min/flight)'] = df_nas_delay_causes['nas_delay'] /df_nas_delay_causes['nas_ct']/df_delay_causes['arr_flights']
df_nas_delay_causes.drop(['nas_ct'],axis = 1, inplace = True)
df_nas_delay_causes.drop(['nas_delay'],axis = 1, inplace = True)
print(df_nas_delay_causes.shape)
df_nas_delay_causes.head(2)
(16903, 20)
| year | month | carrier | carrier_name | AIRPORT | airport_name | arr_flights | arr_del15 | carrier_ct | weather_ct | security_ct | arr_cancelled | arr_diverted | arr_delay | carrier_delay | weather_delay | security_delay | def_avg_Delay | aircraft_delay(min/flight) | nas_delay(min/flight) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2004 | 1 | DL | Delta Air Lines Inc. | SLC | Salt Lake City, UT: Salt Lake City International | 3012.0 | 756.0 | 163.51 | 33.04 | 1.60 | 38.0 | 1.0 | 31310.0 | 8466.0 | 2516.0 | 100.0 | 5371.021712 | 0.017815 | 0.010272 |
| 1 | 2004 | 1 | EV | Atlantic Southeast Airlines | DFW | Dallas/Fort Worth, TX: Dallas/Fort Worth Inter... | 4322.0 | 582.0 | 152.53 | 91.84 | 4.89 | 128.0 | 26.0 | 31531.0 | 10027.0 | 7696.0 | 134.0 | 1455.021712 | 0.011546 | 0.009042 |
#level up and calculate the Delay in (min/flight) for each airport and carrier
#add a new column in the dataframe name 'weather_delay(min/flight)'
df_weather_delay_causes = df_nas_delay_causes.copy()
df_weather_delay_causes
df_weather_delay_causes['weather_delay(min/flight)'] = df_weather_delay_causes['weather_delay'] /df_weather_delay_causes['weather_ct']/df_delay_causes['arr_flights']
df_weather_delay_causes.drop(['weather_ct'],axis = 1, inplace = True)
df_weather_delay_causes.drop(['weather_delay'],axis = 1, inplace = True)
print(df_weather_delay_causes.shape)
df_weather_delay_causes.head(2)
(16903, 19)
| year | month | carrier | carrier_name | AIRPORT | airport_name | arr_flights | arr_del15 | carrier_ct | security_ct | arr_cancelled | arr_diverted | arr_delay | carrier_delay | security_delay | def_avg_Delay | aircraft_delay(min/flight) | nas_delay(min/flight) | weather_delay(min/flight) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2004 | 1 | DL | Delta Air Lines Inc. | SLC | Salt Lake City, UT: Salt Lake City International | 3012.0 | 756.0 | 163.51 | 1.60 | 38.0 | 1.0 | 31310.0 | 8466.0 | 100.0 | 5371.021712 | 0.017815 | 0.010272 | 0.025282 |
| 1 | 2004 | 1 | EV | Atlantic Southeast Airlines | DFW | Dallas/Fort Worth, TX: Dallas/Fort Worth Inter... | 4322.0 | 582.0 | 152.53 | 4.89 | 128.0 | 26.0 | 31531.0 | 10027.0 | 134.0 | 1455.021712 | 0.011546 | 0.009042 | 0.019389 |
#level up and calculate the Delay in (min/flight) for each airport and carrier
#add a new column in the dataframe name 'carrier_delay(min/flight)'
df_carrier_delay_causes = df_weather_delay_causes.copy()
df_carrier_delay_causes
df_carrier_delay_causes['carrier_delay(min/flight)'] = df_carrier_delay_causes['carrier_delay'] /df_carrier_delay_causes['carrier_ct']/df_delay_causes['arr_flights']
df_carrier_delay_causes.drop(['carrier_ct'],axis = 1, inplace = True)
df_carrier_delay_causes.drop(['carrier_delay'],axis = 1, inplace = True)
print(df_carrier_delay_causes.shape)
df_carrier_delay_causes.head(2)
(16903, 18)
| year | month | carrier | carrier_name | AIRPORT | airport_name | arr_flights | arr_del15 | security_ct | arr_cancelled | arr_diverted | arr_delay | security_delay | def_avg_Delay | aircraft_delay(min/flight) | nas_delay(min/flight) | weather_delay(min/flight) | carrier_delay(min/flight) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2004 | 1 | DL | Delta Air Lines Inc. | SLC | Salt Lake City, UT: Salt Lake City International | 3012.0 | 756.0 | 1.60 | 38.0 | 1.0 | 31310.0 | 100.0 | 5371.021712 | 0.017815 | 0.010272 | 0.025282 | 0.01719 |
| 1 | 2004 | 1 | EV | Atlantic Southeast Airlines | DFW | Dallas/Fort Worth, TX: Dallas/Fort Worth Inter... | 4322.0 | 582.0 | 4.89 | 128.0 | 26.0 | 31531.0 | 134.0 | 1455.021712 | 0.011546 | 0.009042 | 0.019389 | 0.01521 |
#level up and calculate the Delay in (min/flight) for each airport and carrier
#add a new column in the dataframe name 'security_delay(min/flight)'
df_security_delay_causes = df_carrier_delay_causes.copy()
df_security_delay_causes
df_security_delay_causes['security_delay(min/flight)'] = df_carrier_delay_causes['security_delay'] /df_carrier_delay_causes['security_ct']/df_delay_causes['arr_flights']
df_security_delay_causes.drop(['security_ct'],axis = 1, inplace = True)
df_security_delay_causes.drop(['security_delay'],axis = 1, inplace = True)
print(df_security_delay_causes.shape)
df_security_delay_causes.head(2)
(16903, 17)
| year | month | carrier | carrier_name | AIRPORT | airport_name | arr_flights | arr_del15 | arr_cancelled | arr_diverted | arr_delay | def_avg_Delay | aircraft_delay(min/flight) | nas_delay(min/flight) | weather_delay(min/flight) | carrier_delay(min/flight) | security_delay(min/flight) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2004 | 1 | DL | Delta Air Lines Inc. | SLC | Salt Lake City, UT: Salt Lake City International | 3012.0 | 756.0 | 38.0 | 1.0 | 31310.0 | 5371.021712 | 0.017815 | 0.010272 | 0.025282 | 0.01719 | 0.02075 |
| 1 | 2004 | 1 | EV | Atlantic Southeast Airlines | DFW | Dallas/Fort Worth, TX: Dallas/Fort Worth Inter... | 4322.0 | 582.0 | 128.0 | 26.0 | 31531.0 | 1455.021712 | 0.011546 | 0.009042 | 0.019389 | 0.01521 | 0.00634 |
#create df_delay_main , and check its top 2 rows
df_delay_main =df_security_delay_causes.copy()
df_delay_main.head(2)
| year | month | carrier | carrier_name | AIRPORT | airport_name | arr_flights | arr_del15 | arr_cancelled | arr_diverted | arr_delay | def_avg_Delay | aircraft_delay(min/flight) | nas_delay(min/flight) | weather_delay(min/flight) | carrier_delay(min/flight) | security_delay(min/flight) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2004 | 1 | DL | Delta Air Lines Inc. | SLC | Salt Lake City, UT: Salt Lake City International | 3012.0 | 756.0 | 38.0 | 1.0 | 31310.0 | 5371.021712 | 0.017815 | 0.010272 | 0.025282 | 0.01719 | 0.02075 |
| 1 | 2004 | 1 | EV | Atlantic Southeast Airlines | DFW | Dallas/Fort Worth, TX: Dallas/Fort Worth Inter... | 4322.0 | 582.0 | 128.0 | 26.0 | 31531.0 | 1455.021712 | 0.011546 | 0.009042 | 0.019389 | 0.01521 | 0.00634 |
#Explore what the histogram of the data looks like after level up data
df_delay_main.hist(figsize=(25,25));
#create delay_main csv file ,Read the csv file, and check its top 2 rows
df_delay_main.to_csv (r'C:\Users\Abdelrazek\PycharmProjects\Flight\delay_main.csv', index = False, header=True)
df_delay_main = pd.read_csv('delay_main.csv')
df_delay_main.head(2)
| year | month | carrier | carrier_name | AIRPORT | airport_name | arr_flights | arr_del15 | arr_cancelled | arr_diverted | arr_delay | def_avg_Delay | aircraft_delay(min/flight) | nas_delay(min/flight) | weather_delay(min/flight) | carrier_delay(min/flight) | security_delay(min/flight) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2004 | 1 | DL | Delta Air Lines Inc. | SLC | Salt Lake City, UT: Salt Lake City International | 3012.0 | 756.0 | 38.0 | 1.0 | 31310.0 | 5371.021712 | 0.017815 | 0.010272 | 0.025282 | 0.01719 | 0.02075 |
| 1 | 2004 | 1 | EV | Atlantic Southeast Airlines | DFW | Dallas/Fort Worth, TX: Dallas/Fort Worth Inter... | 4322.0 | 582.0 | 128.0 | 26.0 | 31531.0 | 1455.021712 | 0.011546 | 0.009042 | 0.019389 | 0.01521 | 0.00634 |
# count carrier_name
df_delay_main.carrier_name.value_counts()
Southwest Airlines Co. 4567 American Airlines Inc. 2168 SkyWest Airlines Inc. 1399 ExpressJet Airlines Inc. 1386 JetBlue Airways 1065 US Airways Inc. 978 Delta Air Lines Inc. 655 Alaska Airlines Inc. 645 Continental Air Lines Inc. 638 United Air Lines Inc. 410 Northwest Airlines Inc. 409 American Eagle Airlines Inc. 382 Mesa Airlines Inc. 323 Spirit Air Lines 290 Comair Inc. 240 Envoy Air 238 Atlantic Southeast Airlines 223 PSA Airlines Inc. 152 Republic Airline 139 Pinnacle Airlines Inc. 133 Virgin America 108 America West Airlines Inc. 89 Hawaiian Airlines Inc. 72 Allegiant Air 63 Atlantic Coast Airlines 37 Endeavor Air Inc. 36 Independence Air 22 ATA Airlines d/b/a ATA 19 Frontier Airlines Inc. 15 Aloha Airlines Inc. 2 Name: carrier_name, dtype: int64
#group data by carrier,year and month for each carrier
df_delay=df_delay_main.copy()
df_delay=df_delay.groupby(['carrier_name','year','month']).sum()
df_delay
| arr_flights | arr_del15 | arr_cancelled | arr_diverted | arr_delay | def_avg_Delay | aircraft_delay(min/flight) | nas_delay(min/flight) | weather_delay(min/flight) | carrier_delay(min/flight) | security_delay(min/flight) | |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| carrier_name | year | month | |||||||||||
| ATA Airlines d/b/a ATA | 2003 | 6 | 2139.0 | 262.0 | 20.0 | 1.0 | 12309.0 | 3443.021712 | 0.028037 | 0.015420 | 0.016867 | 0.033437 | 0.017120 |
| 7 | 2203.0 | 492.0 | 24.0 | 5.0 | 35539.0 | 11074.021712 | 0.038449 | 0.028346 | 0.036791 | 0.043473 | 0.011595 | ||
| 8 | 2215.0 | 369.0 | 19.0 | 2.0 | 24233.0 | 7866.021712 | 0.035180 | 0.025011 | 0.027006 | 0.039702 | 0.014085 | ||
| 9 | 2245.0 | 193.0 | 32.0 | 1.0 | 10265.0 | 2488.021712 | 0.027951 | 0.018261 | 0.039190 | 0.028907 | 0.007572 | ||
| 11 | 2627.0 | 589.0 | 52.0 | 13.0 | 46136.0 | 12092.043424 | 0.428060 | 0.169892 | 0.164054 | 0.184669 | 0.261687 | ||
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| Virgin America | 2017 | 11 | 3317.0 | 785.0 | 51.0 | 34.0 | 41625.0 | 6216.065136 | 0.929567 | 0.633304 | 0.587863 | 0.634352 | 0.524803 |
| 12 | 3317.0 | 541.0 | 13.0 | 26.0 | 26605.0 | 6072.043424 | 0.077100 | 0.046869 | 0.041259 | 0.066558 | 0.039039 | ||
| 2018 | 1 | 3081.0 | 510.0 | 55.0 | 5.0 | 26490.0 | 2567.043424 | 0.085732 | 0.050578 | 0.065614 | 0.092953 | 0.056478 | |
| 2 | 1735.0 | 341.0 | 10.0 | 39.0 | 16835.0 | 2922.021712 | 0.038433 | 0.024163 | 0.027858 | 0.031171 | 0.017190 | ||
| 3 | 3831.0 | 1019.0 | 183.0 | 13.0 | 62723.0 | 10966.065136 | 0.250225 | 0.247854 | 0.364222 | 0.251469 | 0.222125 |
2653 rows × 11 columns
#group data by carrier
df_carrier_delay=df_delay_main.copy()
carrier_delay=df_carrier_delay.groupby(['carrier_name']).sum()
print(carrier_delay.shape)
carrier_delay.head(2)
(30, 13)
| year | month | arr_flights | arr_del15 | arr_cancelled | arr_diverted | arr_delay | def_avg_Delay | aircraft_delay(min/flight) | nas_delay(min/flight) | weather_delay(min/flight) | carrier_delay(min/flight) | security_delay(min/flight) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| carrier_name | |||||||||||||
| ATA Airlines d/b/a ATA | 38076 | 119 | 31878.0 | 6443.0 | 437.0 | 50.0 | 411255.0 | 1.374684e+05 | 2.002676 | 1.029865 | 1.857479 | 1.442589 | 0.853221 |
| Alaska Airlines Inc. | 1297217 | 4269 | 1197926.0 | 213502.0 | 15385.0 | 2446.0 | 10731444.0 | 3.618192e+06 | 91.927932 | 54.736939 | 89.282937 | 98.342151 | 61.206725 |
#create usa_carrier csv file
carrier_delay.to_csv (r'C:\Users\Abdelrazek\PycharmProjects\Flight\carrier_delay.csv', index = True, header=True)
carrier_delay_1 = pd.read_csv('carrier_delay.csv')
carrier_delay_1.head(2)
| carrier_name | year | month | arr_flights | arr_del15 | arr_cancelled | arr_diverted | arr_delay | def_avg_Delay | aircraft_delay(min/flight) | nas_delay(min/flight) | weather_delay(min/flight) | carrier_delay(min/flight) | security_delay(min/flight) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ATA Airlines d/b/a ATA | 38076 | 119 | 31878.0 | 6443.0 | 437.0 | 50.0 | 411255.0 | 1.374684e+05 | 2.002676 | 1.029865 | 1.857479 | 1.442589 | 0.853221 |
| 1 | Alaska Airlines Inc. | 1297217 | 4269 | 1197926.0 | 213502.0 | 15385.0 | 2446.0 | 10731444.0 | 3.618192e+06 | 91.927932 | 54.736939 | 89.282937 | 98.342151 | 61.206725 |
#reindex dataframe
carrier_delay_df=carrier_delay_1.reindex
carrier_delay_df=carrier_delay_1.set_index('carrier_name')
carrier_delay_df.head()
| year | month | arr_flights | arr_del15 | arr_cancelled | arr_diverted | arr_delay | def_avg_Delay | aircraft_delay(min/flight) | nas_delay(min/flight) | weather_delay(min/flight) | carrier_delay(min/flight) | security_delay(min/flight) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| carrier_name | |||||||||||||
| ATA Airlines d/b/a ATA | 38076 | 119 | 31878.0 | 6443.0 | 437.0 | 50.0 | 411255.0 | 1.374684e+05 | 2.002676 | 1.029865 | 1.857479 | 1.442589 | 0.853221 |
| Alaska Airlines Inc. | 1297217 | 4269 | 1197926.0 | 213502.0 | 15385.0 | 2446.0 | 10731444.0 | 3.618192e+06 | 91.927932 | 54.736939 | 89.282937 | 98.342151 | 61.206725 |
| Allegiant Air | 127181 | 326 | 36854.0 | 9207.0 | 1239.0 | 142.0 | 725316.0 | 2.342964e+05 | 12.643597 | 8.291653 | 14.119938 | 15.598210 | 5.271055 |
| Aloha Airlines Inc. | 4015 | 3 | 3066.0 | 167.0 | 22.0 | 2.0 | 6207.0 | -1.189566e+02 | 0.050396 | 0.038607 | 0.028420 | 0.050248 | 0.049424 |
| America West Airlines Inc. | 178371 | 611 | 245686.0 | 39735.0 | 3494.0 | 284.0 | 1883085.0 | 4.536789e+05 | 7.736412 | 5.150561 | 9.069546 | 6.975657 | 4.223227 |
#reduce arrival flight Dividing Values by 1000
carrier_delay_df2=carrier_delay_1.copy()
carrier_delay_df2['arr_flights'] = (carrier_delay_df2['arr_flights'] / 1000)
carrier_delay_df2.head()
| carrier_name | year | month | arr_flights | arr_del15 | arr_cancelled | arr_diverted | arr_delay | def_avg_Delay | aircraft_delay(min/flight) | nas_delay(min/flight) | weather_delay(min/flight) | carrier_delay(min/flight) | security_delay(min/flight) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ATA Airlines d/b/a ATA | 38076 | 119 | 31.878 | 6443.0 | 437.0 | 50.0 | 411255.0 | 1.374684e+05 | 2.002676 | 1.029865 | 1.857479 | 1.442589 | 0.853221 |
| 1 | Alaska Airlines Inc. | 1297217 | 4269 | 1197.926 | 213502.0 | 15385.0 | 2446.0 | 10731444.0 | 3.618192e+06 | 91.927932 | 54.736939 | 89.282937 | 98.342151 | 61.206725 |
| 2 | Allegiant Air | 127181 | 326 | 36.854 | 9207.0 | 1239.0 | 142.0 | 725316.0 | 2.342964e+05 | 12.643597 | 8.291653 | 14.119938 | 15.598210 | 5.271055 |
| 3 | Aloha Airlines Inc. | 4015 | 3 | 3.066 | 167.0 | 22.0 | 2.0 | 6207.0 | -1.189566e+02 | 0.050396 | 0.038607 | 0.028420 | 0.050248 | 0.049424 |
| 4 | America West Airlines Inc. | 178371 | 611 | 245.686 | 39735.0 | 3494.0 | 284.0 | 1883085.0 | 4.536789e+05 | 7.736412 | 5.150561 | 9.069546 | 6.975657 | 4.223227 |
carrier_delay_df2.columns
Index(['carrier_name', 'year', 'month', 'arr_flights', 'arr_del15',
'arr_cancelled', 'arr_diverted', 'arr_delay', 'def_avg_Delay',
'aircraft_delay(min/flight)', 'nas_delay(min/flight)',
'weather_delay(min/flight)', 'carrier_delay(min/flight)',
'security_delay(min/flight)'],
dtype='object')
#get the most carrier agent make flights
most_aircraft_flights=carrier_delay_df2.sort_values(by='arr_flights',ascending=False )
most_aircraft_flights.plot(kind="bar",x='carrier_name',y='arr_flights',figsize=(20,10)),sorted
print('The Mean of Arrival Flight/1000')
print(most_aircraft_flights['arr_flights'].mean())
print('Southwest Airlines Co is the most carrier agent makeing flights')
plt.show()
The Mean of Arrival Flight/1000 1470.2529333333332 Southwest Airlines Co is the most carrier agent makeing flights
#get the most carrier agent make delay
most_aircraft_delay=carrier_delay_df2.sort_values(by='aircraft_delay(min/flight)',ascending=False )
most_aircraft_delay.plot(kind="bar",x='carrier_name',y='aircraft_delay(min/flight)',figsize=(20,10)),sorted
print('The Mean of Aircraft Delay(min/flight)')
print(most_aircraft_delay['aircraft_delay(min/flight)'].mean())
print('ExpressJet Airlines Inc is the most carrier agent makeing Delay')
plt.show()
The Mean of Aircraft Delay(min/flight) 53.01640733771036 ExpressJet Airlines Inc is the most carrier agent makeing Delay
#get the most carrier agent make delay more than def_avg_Delay
carrier_delay_df2['def_avg_Delay'] = (carrier_delay_df2['def_avg_Delay'] / 1000)
carrier_delay_df2.plot(kind='area',x='carrier_name',y='def_avg_Delay', color='red',stacked=False ,figsize=(20,10)),sorted
print('The Mean of def_avg_Delay/1000')
print(most_aircraft_delay['def_avg_Delay'].mean())
plt.show()
The Mean of def_avg_Delay/1000 5675511.9000122715
#group data by airports
df_airports_delay=df_delay_main.copy()
airports_delay=df_airports_delay.groupby(['AIRPORT']).sum()
print(airports_delay.shape)
airports_delay.head(2)
(254, 13)
| year | month | arr_flights | arr_del15 | arr_cancelled | arr_diverted | arr_delay | def_avg_Delay | aircraft_delay(min/flight) | nas_delay(min/flight) | weather_delay(min/flight) | carrier_delay(min/flight) | security_delay(min/flight) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AIRPORT | |||||||||||||
| ABE | 6029 | 15 | 391.0 | 119.0 | 12.0 | 3.0 | 8453.0 | -2605.934864 | 1.953684 | 1.895971 | 2.112518 | 1.573425 | 0.950482 |
| ABI | 10075 | 33 | 1026.0 | 224.0 | 20.0 | 5.0 | 11507.0 | -3153.891439 | 1.430998 | 1.039528 | 2.207824 | 1.106756 | 1.220141 |
#create usa_airports csv file
airports_delay.to_csv (r'C:\Users\Abdelrazek\PycharmProjects\Flight\airports_delay.csv', index = True, header=True)
airports_delay = pd.read_csv('airports_delay.csv')
airports_delay.head(2)
| AIRPORT | year | month | arr_flights | arr_del15 | arr_cancelled | arr_diverted | arr_delay | def_avg_Delay | aircraft_delay(min/flight) | nas_delay(min/flight) | weather_delay(min/flight) | carrier_delay(min/flight) | security_delay(min/flight) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ABE | 6029 | 15 | 391.0 | 119.0 | 12.0 | 3.0 | 8453.0 | -2605.934864 | 1.953684 | 1.895971 | 2.112518 | 1.573425 | 0.950482 |
| 1 | ABI | 10075 | 33 | 1026.0 | 224.0 | 20.0 | 5.0 | 11507.0 | -3153.891439 | 1.430998 | 1.039528 | 2.207824 | 1.106756 | 1.220141 |
airports_delay.reindex
<bound method DataFrame.reindex of AIRPORT year month arr_flights arr_del15 arr_cancelled \
0 ABE 6029 15 391.0 119.0 12.0
1 ABI 10075 33 1026.0 224.0 20.0
2 ABQ 221093 743 140850.0 26747.0 781.0
3 ACK 6028 24 251.0 84.0 24.0
4 ACT 2018 7 146.0 35.0 10.0
.. ... ... ... ... ... ...
249 VLD 2007 7 87.0 33.0 3.0
250 VPS 22125 95 1721.0 513.0 56.0
251 WRG 6026 21 186.0 46.0 10.0
252 XNA 48256 148 6225.0 1528.0 225.0
253 XWA 2019 12 79.0 17.0 1.0
arr_diverted arr_delay def_avg_Delay aircraft_delay(min/flight) \
0 3.0 8453.0 -2605.934864 1.953684
1 5.0 11507.0 -3153.891439 1.430998
2 347.0 1214923.0 530594.388336 8.682357
3 6.0 5261.0 -3740.934864 2.461677
4 1.0 1688.0 -906.978288 0.314142
.. ... ... ... ...
249 1.0 2157.0 -1316.978288 0.536398
250 18.0 29499.0 -8180.761166 6.237859
251 6.0 2399.0 -3518.934864 2.247519
252 43.0 87985.0 -6303.478909 8.160817
253 5.0 1701.0 -606.978288 1.777319
nas_delay(min/flight) weather_delay(min/flight) \
0 1.895971 2.112518
1 1.039528 2.207824
2 5.353606 10.648411
3 2.770015 4.746400
4 0.235193 0.911444
.. ... ...
249 0.731452 0.752459
250 3.339884 6.542708
251 2.037139 2.370968
252 5.002513 10.286498
253 0.449459 1.338156
carrier_delay(min/flight) security_delay(min/flight)
0 1.573425 0.950482
1 1.106756 1.220141
2 7.687052 6.416964
3 2.687697 3.259593
4 0.354659 0.353513
.. ... ...
249 0.842673 0.433390
250 6.983359 4.257582
251 3.587611 1.597255
252 6.925848 3.949613
253 1.140559 0.240346
[254 rows x 14 columns]>
#join df_airports and df_airports to get airports delay with LATITUDE and LONGITUDE
df_geo=pd.merge(airports_delay ,usa_airports_df_1 ,how ='left',on='AIRPORT')
df_geo
| AIRPORT | year | month | arr_flights | arr_del15 | arr_cancelled | arr_diverted | arr_delay | def_avg_Delay | aircraft_delay(min/flight) | nas_delay(min/flight) | weather_delay(min/flight) | carrier_delay(min/flight) | security_delay(min/flight) | AIRPORT_ID | DISPLAY_AIRPORT_NAME | DISPLAY_AIRPORT_CITY_NAME_FULL | AIRPORT_COUNTRY_NAME | AIRPORT_STATE_NAME | DISPLAY_CITY_MARKET_NAME_FULL | LATITUDE | LONGITUDE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ABE | 6029 | 15 | 391.0 | 119.0 | 12.0 | 3.0 | 8453.0 | -2605.934864 | 1.953684 | 1.895971 | 2.112518 | 1.573425 | 0.950482 | 10135 | Lehigh Valley International | Allentown/Bethlehem/Easton, PA | United States | Pennsylvania | Allentown/Bethlehem/Easton, PA | 40.654722 | -75.438333 |
| 1 | ABE | 6029 | 15 | 391.0 | 119.0 | 12.0 | 3.0 | 8453.0 | -2605.934864 | 1.953684 | 1.895971 | 2.112518 | 1.573425 | 0.950482 | 10135 | Lehigh Valley International | Allentown/Bethlehem/Easton, PA | United States | Pennsylvania | Allentown/Bethlehem/Easton, PA | 40.652222 | -75.440278 |
| 2 | ABE | 6029 | 15 | 391.0 | 119.0 | 12.0 | 3.0 | 8453.0 | -2605.934864 | 1.953684 | 1.895971 | 2.112518 | 1.573425 | 0.950482 | 10135 | Lehigh Valley International | Allentown/Bethlehem/Easton, PA | United States | Pennsylvania | Allentown/Bethlehem/Easton, PA | 40.652222 | -75.440556 |
| 3 | ABE | 6029 | 15 | 391.0 | 119.0 | 12.0 | 3.0 | 8453.0 | -2605.934864 | 1.953684 | 1.895971 | 2.112518 | 1.573425 | 0.950482 | 10135 | Lehigh Valley International | Allentown/Bethlehem/Easton, PA | United States | Pennsylvania | Allentown/Bethlehem/Easton, PA | 40.652222 | -75.440278 |
| 4 | ABE | 6029 | 15 | 391.0 | 119.0 | 12.0 | 3.0 | 8453.0 | -2605.934864 | 1.953684 | 1.895971 | 2.112518 | 1.573425 | 0.950482 | 10135 | Lehigh Valley International | Allentown/Bethlehem/Easton, PA | United States | Pennsylvania | Allentown/Bethlehem/Easton, PA | 40.652222 | -75.440556 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 892 | XNA | 48256 | 148 | 6225.0 | 1528.0 | 225.0 | 43.0 | 87985.0 | -6303.478909 | 8.160817 | 5.002513 | 10.286498 | 6.925848 | 3.949613 | 15919 | Northwest Arkansas Regional | Fayetteville, AR | United States | Arkansas | Fayetteville, AR | 36.281667 | -94.307778 |
| 893 | XNA | 48256 | 148 | 6225.0 | 1528.0 | 225.0 | 43.0 | 87985.0 | -6303.478909 | 8.160817 | 5.002513 | 10.286498 | 6.925848 | 3.949613 | 15919 | Northwest Arkansas Regional | Fayetteville, AR | United States | Arkansas | Fayetteville, AR | 36.281944 | -94.306944 |
| 894 | XNA | 48256 | 148 | 6225.0 | 1528.0 | 225.0 | 43.0 | 87985.0 | -6303.478909 | 8.160817 | 5.002513 | 10.286498 | 6.925848 | 3.949613 | 15919 | Northwest Arkansas Regional | Fayetteville, AR | United States | Arkansas | Fayetteville, AR | 36.281667 | -94.307778 |
| 895 | XWA | 2019 | 12 | 79.0 | 17.0 | 1.0 | 5.0 | 1701.0 | -606.978288 | 1.777319 | 0.449459 | 1.338156 | 1.140559 | 0.240346 | 15934 | Granite Point Ndb | Granite Point, AK | United States | Alaska | Granite Point, AK | 60.961667 | -151.331667 |
| 896 | XWA | 2019 | 12 | 79.0 | 17.0 | 1.0 | 5.0 | 1701.0 | -606.978288 | 1.777319 | 0.449459 | 1.338156 | 1.140559 | 0.240346 | 16869 | Williston Basin International | Williston, ND | United States | North Dakota | Williston, ND | 48.260833 | -103.751111 |
897 rows × 22 columns
df_geo_1=df_geo.drop_duplicates('AIRPORT')
print(df_geo_1.shape)
df_geo_1.head()
(254, 22)
| AIRPORT | year | month | arr_flights | arr_del15 | arr_cancelled | arr_diverted | arr_delay | def_avg_Delay | aircraft_delay(min/flight) | nas_delay(min/flight) | weather_delay(min/flight) | carrier_delay(min/flight) | security_delay(min/flight) | AIRPORT_ID | DISPLAY_AIRPORT_NAME | DISPLAY_AIRPORT_CITY_NAME_FULL | AIRPORT_COUNTRY_NAME | AIRPORT_STATE_NAME | DISPLAY_CITY_MARKET_NAME_FULL | LATITUDE | LONGITUDE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ABE | 6029 | 15 | 391.0 | 119.0 | 12.0 | 3.0 | 8453.0 | -2605.934864 | 1.953684 | 1.895971 | 2.112518 | 1.573425 | 0.950482 | 10135 | Lehigh Valley International | Allentown/Bethlehem/Easton, PA | United States | Pennsylvania | Allentown/Bethlehem/Easton, PA | 40.654722 | -75.438333 |
| 6 | ABI | 10075 | 33 | 1026.0 | 224.0 | 20.0 | 5.0 | 11507.0 | -3153.891439 | 1.430998 | 1.039528 | 2.207824 | 1.106756 | 1.220141 | 10136 | Abilene Regional | Abilene, TX | United States | Texas | Abilene, TX | 32.409444 | -99.679722 |
| 9 | ABQ | 221093 | 743 | 140850.0 | 26747.0 | 781.0 | 347.0 | 1214923.0 | 530594.388336 | 8.682357 | 5.353606 | 10.648411 | 7.687052 | 6.416964 | 10140 | Albuquerque International Sunport | Albuquerque, NM | United States | New Mexico | Albuquerque, NM | 35.041667 | -106.606389 |
| 14 | ACK | 6028 | 24 | 251.0 | 84.0 | 24.0 | 6.0 | 5261.0 | -3740.934864 | 2.461677 | 2.770015 | 4.746400 | 2.687697 | 3.259593 | 10154 | Nantucket Memorial | Nantucket, MA | United States | Massachusetts | Nantucket, MA | 41.256667 | -70.059722 |
| 19 | ACT | 2018 | 7 | 146.0 | 35.0 | 10.0 | 1.0 | 1688.0 | -906.978288 | 0.314142 | 0.235193 | 0.911444 | 0.354659 | 0.353513 | 10155 | Waco Regional | Waco, TX | United States | Texas | Waco, TX | 31.611389 | -97.228611 |
#create usa_airports geo csv file
df_geo_1.to_csv (r'C:\Users\Abdelrazek\PycharmProjects\Flight\airports_geo.csv', index = False, header=True)
df_airports_geo = pd.read_csv('airports_geo.csv')
df_airports_geo.head(2)
| AIRPORT | year | month | arr_flights | arr_del15 | arr_cancelled | arr_diverted | arr_delay | def_avg_Delay | aircraft_delay(min/flight) | nas_delay(min/flight) | weather_delay(min/flight) | carrier_delay(min/flight) | security_delay(min/flight) | AIRPORT_ID | DISPLAY_AIRPORT_NAME | DISPLAY_AIRPORT_CITY_NAME_FULL | AIRPORT_COUNTRY_NAME | AIRPORT_STATE_NAME | DISPLAY_CITY_MARKET_NAME_FULL | LATITUDE | LONGITUDE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ABE | 6029 | 15 | 391.0 | 119.0 | 12.0 | 3.0 | 8453.0 | -2605.934864 | 1.953684 | 1.895971 | 2.112518 | 1.573425 | 0.950482 | 10135 | Lehigh Valley International | Allentown/Bethlehem/Easton, PA | United States | Pennsylvania | Allentown/Bethlehem/Easton, PA | 40.654722 | -75.438333 |
| 1 | ABI | 10075 | 33 | 1026.0 | 224.0 | 20.0 | 5.0 | 11507.0 | -3153.891439 | 1.430998 | 1.039528 | 2.207824 | 1.106756 | 1.220141 | 10136 | Abilene Regional | Abilene, TX | United States | Texas | Abilene, TX | 32.409444 | -99.679722 |
# Combining Lattitude and Longitude to create coordinates:
df_airports_geo_1=df_airports_geo.copy()
df_airports_geo_1['coordinates'] = df_airports_geo_1[['LONGITUDE','LATITUDE']].values.tolist()
df_airports_geo_1.head()
| AIRPORT | year | month | arr_flights | arr_del15 | arr_cancelled | arr_diverted | arr_delay | def_avg_Delay | aircraft_delay(min/flight) | nas_delay(min/flight) | weather_delay(min/flight) | carrier_delay(min/flight) | security_delay(min/flight) | AIRPORT_ID | DISPLAY_AIRPORT_NAME | DISPLAY_AIRPORT_CITY_NAME_FULL | AIRPORT_COUNTRY_NAME | AIRPORT_STATE_NAME | DISPLAY_CITY_MARKET_NAME_FULL | LATITUDE | LONGITUDE | coordinates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ABE | 6029 | 15 | 391.0 | 119.0 | 12.0 | 3.0 | 8453.0 | -2605.934864 | 1.953684 | 1.895971 | 2.112518 | 1.573425 | 0.950482 | 10135 | Lehigh Valley International | Allentown/Bethlehem/Easton, PA | United States | Pennsylvania | Allentown/Bethlehem/Easton, PA | 40.654722 | -75.438333 | [-75.43833333, 40.65472222] |
| 1 | ABI | 10075 | 33 | 1026.0 | 224.0 | 20.0 | 5.0 | 11507.0 | -3153.891439 | 1.430998 | 1.039528 | 2.207824 | 1.106756 | 1.220141 | 10136 | Abilene Regional | Abilene, TX | United States | Texas | Abilene, TX | 32.409444 | -99.679722 | [-99.67972222, 32.40944444] |
| 2 | ABQ | 221093 | 743 | 140850.0 | 26747.0 | 781.0 | 347.0 | 1214923.0 | 530594.388336 | 8.682357 | 5.353606 | 10.648411 | 7.687052 | 6.416964 | 10140 | Albuquerque International Sunport | Albuquerque, NM | United States | New Mexico | Albuquerque, NM | 35.041667 | -106.606389 | [-106.60638889, 35.04166667] |
| 3 | ACK | 6028 | 24 | 251.0 | 84.0 | 24.0 | 6.0 | 5261.0 | -3740.934864 | 2.461677 | 2.770015 | 4.746400 | 2.687697 | 3.259593 | 10154 | Nantucket Memorial | Nantucket, MA | United States | Massachusetts | Nantucket, MA | 41.256667 | -70.059722 | [-70.05972222, 41.25666667] |
| 4 | ACT | 2018 | 7 | 146.0 | 35.0 | 10.0 | 1.0 | 1688.0 | -906.978288 | 0.314142 | 0.235193 | 0.911444 | 0.354659 | 0.353513 | 10155 | Waco Regional | Waco, TX | United States | Texas | Waco, TX | 31.611389 | -97.228611 | [-97.22861111, 31.61138889] |
# Change the coordinates to a geoPoint
from shapely.geometry import Point
df_airports_geo_1['coordinates'] = df_airports_geo_1['coordinates'].apply(Point)
df_airports_geo_1.head()
| AIRPORT | year | month | arr_flights | arr_del15 | arr_cancelled | arr_diverted | arr_delay | def_avg_Delay | aircraft_delay(min/flight) | nas_delay(min/flight) | weather_delay(min/flight) | carrier_delay(min/flight) | security_delay(min/flight) | AIRPORT_ID | DISPLAY_AIRPORT_NAME | DISPLAY_AIRPORT_CITY_NAME_FULL | AIRPORT_COUNTRY_NAME | AIRPORT_STATE_NAME | DISPLAY_CITY_MARKET_NAME_FULL | LATITUDE | LONGITUDE | coordinates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ABE | 6029 | 15 | 391.0 | 119.0 | 12.0 | 3.0 | 8453.0 | -2605.934864 | 1.953684 | 1.895971 | 2.112518 | 1.573425 | 0.950482 | 10135 | Lehigh Valley International | Allentown/Bethlehem/Easton, PA | United States | Pennsylvania | Allentown/Bethlehem/Easton, PA | 40.654722 | -75.438333 | POINT (-75.43833333000001 40.65472222) |
| 1 | ABI | 10075 | 33 | 1026.0 | 224.0 | 20.0 | 5.0 | 11507.0 | -3153.891439 | 1.430998 | 1.039528 | 2.207824 | 1.106756 | 1.220141 | 10136 | Abilene Regional | Abilene, TX | United States | Texas | Abilene, TX | 32.409444 | -99.679722 | POINT (-99.67972222 32.40944444) |
| 2 | ABQ | 221093 | 743 | 140850.0 | 26747.0 | 781.0 | 347.0 | 1214923.0 | 530594.388336 | 8.682357 | 5.353606 | 10.648411 | 7.687052 | 6.416964 | 10140 | Albuquerque International Sunport | Albuquerque, NM | United States | New Mexico | Albuquerque, NM | 35.041667 | -106.606389 | POINT (-106.60638889 35.04166667) |
| 3 | ACK | 6028 | 24 | 251.0 | 84.0 | 24.0 | 6.0 | 5261.0 | -3740.934864 | 2.461677 | 2.770015 | 4.746400 | 2.687697 | 3.259593 | 10154 | Nantucket Memorial | Nantucket, MA | United States | Massachusetts | Nantucket, MA | 41.256667 | -70.059722 | POINT (-70.05972222 41.25666667) |
| 4 | ACT | 2018 | 7 | 146.0 | 35.0 | 10.0 | 1.0 | 1688.0 | -906.978288 | 0.314142 | 0.235193 | 0.911444 | 0.354659 | 0.353513 | 10155 | Waco Regional | Waco, TX | United States | Texas | Waco, TX | 31.611389 | -97.228611 | POINT (-97.22861111 31.61138889) |
df_airports_geo_1.to_csv (r'C:\Users\Abdelrazek\PycharmProjects\Flight\airports_coordinates.csv', index = False, header=True)
geo= pd.read_csv('airports_coordinates.csv')
geo.head(2)
| AIRPORT | year | month | arr_flights | arr_del15 | arr_cancelled | arr_diverted | arr_delay | def_avg_Delay | aircraft_delay(min/flight) | nas_delay(min/flight) | weather_delay(min/flight) | carrier_delay(min/flight) | security_delay(min/flight) | AIRPORT_ID | DISPLAY_AIRPORT_NAME | DISPLAY_AIRPORT_CITY_NAME_FULL | AIRPORT_COUNTRY_NAME | AIRPORT_STATE_NAME | DISPLAY_CITY_MARKET_NAME_FULL | LATITUDE | LONGITUDE | coordinates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ABE | 6029 | 15 | 391.0 | 119.0 | 12.0 | 3.0 | 8453.0 | -2605.934864 | 1.953684 | 1.895971 | 2.112518 | 1.573425 | 0.950482 | 10135 | Lehigh Valley International | Allentown/Bethlehem/Easton, PA | United States | Pennsylvania | Allentown/Bethlehem/Easton, PA | 40.654722 | -75.438333 | POINT (-75.43833333000001 40.65472222) |
| 1 | ABI | 10075 | 33 | 1026.0 | 224.0 | 20.0 | 5.0 | 11507.0 | -3153.891439 | 1.430998 | 1.039528 | 2.207824 | 1.106756 | 1.220141 | 10136 | Abilene Regional | Abilene, TX | United States | Texas | Abilene, TX | 32.409444 | -99.679722 | POINT (-99.67972222 32.40944444) |
#us airport distribution by state and Weather Delay info
print ('US Airports distribution by state and Weather Delay info')
fig = px.scatter_mapbox(geo, lat="LATITUDE", lon="LONGITUDE", hover_name="AIRPORT_STATE_NAME", hover_data=["AIRPORT_STATE_NAME", "weather_delay(min/flight)"],
color_discrete_sequence=["fuchsia"], zoom=3, height=300)
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
US Airports distribution by state and Weather Delay info
#get the most airports makeing weather_delay
most_airports_delay=geo.sort_values(by='weather_delay(min/flight)',ascending=False )
most_airports_delay.head(30).plot(kind="bar",x='DISPLAY_AIRPORT_NAME',y='weather_delay(min/flight)',figsize=(20,10)),sorted
print('The Mean of Weather Delay(min/flight)')
print(geo['weather_delay(min/flight)'].mean())
print('Chicago O Hare International is the Most Airports with weather Delay')
plt.show()
The Mean of Weather Delay(min/flight) 7.83864917649772 Chicago O Hare International is the Most Airports with weather Delay
#using Google API Maps
import gmaps
gmaps.configure(api_key="AIzaSyAIX8emBZKPsXTbOxs_fqHfm0lIalYQ1_8")# Fill in with your API key
#us airport distribution by state and Weather Delay info #using Google API Maps #map view
#print('Map of US Airports shown the most affected with Weather Delay')
locations = geo[['LATITUDE', 'LONGITUDE']]
weights = geo['weather_delay(min/flight)']
fig = gmaps.figure()
fig.add_layer(gmaps.heatmap_layer(locations, weights=weights))
fig
# What is Most US Airport by Longitude, Latitude had Flight Cancelled
print('Flight Cancelled by US Airport Longitude, Latitude')
geo.plot(kind="scatter", x="LONGITUDE", y="LATITUDE",
s=geo['arr_cancelled'], label="Flight Cancelled",
c="arr_cancelled", cmap=plt.get_cmap("jet"),
colorbar=True, alpha=0.4, figsize=(20,10),
)
plt.legend()
plt.show()
Flight Cancelled by US Airport Longitude, Latitude
#us airport distribution by state and Carrier Delay info #using Google API Maps #satellite view
#print('Map satellite view of US Airports shown the most affected with Carrier Delay')
locations = df_geo_1[['LATITUDE', 'LONGITUDE']]
weights = df_geo_1['carrier_delay(min/flight)']
fig = gmaps.figure()
fig.add_layer(gmaps.heatmap_layer(locations, weights=weights))
fig
#group data by state
df_state_delay=geo.copy()
df_state_delay=df_state_delay.groupby(['AIRPORT_STATE_NAME']).sum()
print(df_state_delay.shape)
df_state_delay.head(5)
(51, 16)
| year | month | arr_flights | arr_del15 | arr_cancelled | arr_diverted | arr_delay | def_avg_Delay | aircraft_delay(min/flight) | nas_delay(min/flight) | weather_delay(min/flight) | carrier_delay(min/flight) | security_delay(min/flight) | AIRPORT_ID | LATITUDE | LONGITUDE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AIRPORT_STATE_NAME | ||||||||||||||||
| Alabama | 174894 | 533 | 31254.0 | 7324.0 | 477.0 | 195.0 | 392864.0 | 4.343589e+04 | 23.176639 | 13.905337 | 24.608511 | 20.349085 | 14.857079 | 60823 | 162.514167 | -433.611389 |
| Alaska | 560842 | 1890 | 199511.0 | 38313.0 | 4092.0 | 763.0 | 1859119.0 | 3.999081e+05 | 72.540015 | 42.167852 | 67.599382 | 75.776281 | 48.684258 | 171681 | 802.553611 | -1906.075278 |
| Arizona | 1088309 | 3496 | 2170576.0 | 353602.0 | 27423.0 | 3200.0 | 17363210.0 | 6.942681e+06 | 25.194640 | 14.252868 | 31.687516 | 22.507049 | 19.001906 | 51644 | 133.997778 | -446.277222 |
| Arkansas | 118594 | 322 | 17083.0 | 4308.0 | 457.0 | 110.0 | 238549.0 | 1.012628e+04 | 16.854263 | 11.062468 | 19.620841 | 14.060666 | 10.004956 | 44312 | 104.467500 | -280.530556 |
| California | 4442767 | 14487 | 4451001.0 | 934650.0 | 72561.0 | 8304.0 | 50659675.0 | 1.911307e+07 | 140.458252 | 94.647788 | 174.499397 | 126.540595 | 98.222667 | 253731 | 683.773056 | -2282.100556 |
#create usa_state csv file
df_state_delay.to_csv (r'C:\Users\Abdelrazek\PycharmProjects\Flight\state_delay.csv', index = True, header=True)
df_state_delay = pd.read_csv('state_delay.csv')
df_state_delay.head(2)
| AIRPORT_STATE_NAME | year | month | arr_flights | arr_del15 | arr_cancelled | arr_diverted | arr_delay | def_avg_Delay | aircraft_delay(min/flight) | nas_delay(min/flight) | weather_delay(min/flight) | carrier_delay(min/flight) | security_delay(min/flight) | AIRPORT_ID | LATITUDE | LONGITUDE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Alabama | 174894 | 533 | 31254.0 | 7324.0 | 477.0 | 195.0 | 392864.0 | 43435.888957 | 23.176639 | 13.905337 | 24.608511 | 20.349085 | 14.857079 | 60823 | 162.514167 | -433.611389 |
| 1 | Alaska | 560842 | 1890 | 199511.0 | 38313.0 | 4092.0 | 763.0 | 1859119.0 | 399908.057688 | 72.540015 | 42.167852 | 67.599382 | 75.776281 | 48.684258 | 171681 | 802.553611 | -1906.075278 |
#rename columns to apply merge
df_state_delay.rename(columns={'AIRPORT_STATE_NAME' : 'STATE_NAME'}, inplace=True)
df_state_delay.head(2)
| STATE_NAME | year | month | arr_flights | arr_del15 | arr_cancelled | arr_diverted | arr_delay | def_avg_Delay | aircraft_delay(min/flight) | nas_delay(min/flight) | weather_delay(min/flight) | carrier_delay(min/flight) | security_delay(min/flight) | AIRPORT_ID | LATITUDE | LONGITUDE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Alabama | 174894 | 533 | 31254.0 | 7324.0 | 477.0 | 195.0 | 392864.0 | 43435.888957 | 23.176639 | 13.905337 | 24.608511 | 20.349085 | 14.857079 | 60823 | 162.514167 | -433.611389 |
| 1 | Alaska | 560842 | 1890 | 199511.0 | 38313.0 | 4092.0 | 763.0 | 1859119.0 | 399908.057688 | 72.540015 | 42.167852 | 67.599382 | 75.776281 | 48.684258 | 171681 | 802.553611 | -1906.075278 |
#join df_state_delay and df_airports to get states delay with LATITUDE and LONGITUDE
df_state = pd.read_csv('state.csv')
df_state_geo=pd.merge(df_state_delay ,df_state ,how ='outer',on='STATE_NAME')
df_state_geo.drop(['LATITUDE'],axis = 1, inplace = True)
df_state_geo.drop(['LONGITUDE'],axis = 1, inplace = True)
print(df_state_geo.shape)
df_state_geo.head(2)
(53, 18)
| STATE_NAME | year | month | arr_flights | arr_del15 | arr_cancelled | arr_diverted | arr_delay | def_avg_Delay | aircraft_delay(min/flight) | nas_delay(min/flight) | weather_delay(min/flight) | carrier_delay(min/flight) | security_delay(min/flight) | AIRPORT_ID | state | latitude | longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Alabama | 174894.0 | 533.0 | 31254.0 | 7324.0 | 477.0 | 195.0 | 392864.0 | 43435.888957 | 23.176639 | 13.905337 | 24.608511 | 20.349085 | 14.857079 | 60823.0 | AL | 32.318231 | -86.902298 |
| 1 | Alaska | 560842.0 | 1890.0 | 199511.0 | 38313.0 | 4092.0 | 763.0 | 1859119.0 | 399908.057688 | 72.540015 | 42.167852 | 67.599382 | 75.776281 | 48.684258 | 171681.0 | AK | 63.588753 | -154.493062 |
#drop NaN value
most_state_delay=df_state_geo.dropna()
most_state_delay
| STATE_NAME | year | month | arr_flights | arr_del15 | arr_cancelled | arr_diverted | arr_delay | def_avg_Delay | aircraft_delay(min/flight) | nas_delay(min/flight) | weather_delay(min/flight) | carrier_delay(min/flight) | security_delay(min/flight) | AIRPORT_ID | state | latitude | longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Alabama | 174894.0 | 533.0 | 31254.0 | 7324.0 | 477.0 | 195.0 | 392864.0 | 4.343589e+04 | 23.176639 | 13.905337 | 24.608511 | 20.349085 | 14.857079 | 60823.0 | AL | 32.318231 | -86.902298 |
| 1 | Alaska | 560842.0 | 1890.0 | 199511.0 | 38313.0 | 4092.0 | 763.0 | 1859119.0 | 3.999081e+05 | 72.540015 | 42.167852 | 67.599382 | 75.776281 | 48.684258 | 171681.0 | AK | 63.588753 | -154.493062 |
| 2 | Arizona | 1088309.0 | 3496.0 | 2170576.0 | 353602.0 | 27423.0 | 3200.0 | 17363210.0 | 6.942681e+06 | 25.194640 | 14.252868 | 31.687516 | 22.507049 | 19.001906 | 51644.0 | AZ | 34.048928 | -111.093731 |
| 3 | Arkansas | 118594.0 | 322.0 | 17083.0 | 4308.0 | 457.0 | 110.0 | 238549.0 | 1.012628e+04 | 16.854263 | 11.062468 | 19.620841 | 14.060666 | 10.004956 | 44312.0 | AR | 35.201050 | -91.831833 |
| 4 | California | 4442767.0 | 14487.0 | 4451001.0 | 934650.0 | 72561.0 | 8304.0 | 50659675.0 | 1.911307e+07 | 140.458252 | 94.647788 | 174.499397 | 126.540595 | 98.222667 | 253731.0 | CA | 36.778261 | -119.417932 |
| 5 | Colorado | 1164865.0 | 3768.0 | 1797350.0 | 319220.0 | 30323.0 | 5497.0 | 17936969.0 | 7.004795e+06 | 45.882180 | 30.596063 | 59.205471 | 39.804111 | 34.893473 | 81112.0 | CO | 39.550051 | -105.782067 |
| 6 | Connecticut | 86539.0 | 258.0 | 16982.0 | 4127.0 | 410.0 | 61.0 | 223704.0 | 3.575893e+04 | 11.553794 | 6.860782 | 11.647598 | 9.967518 | 10.435934 | 22773.0 | CT | 41.603221 | -73.087749 |
| 7 | Florida | 3410419.0 | 10469.0 | 2570463.0 | 536859.0 | 37296.0 | 7301.0 | 31394528.0 | 1.068237e+07 | 159.069192 | 96.573227 | 205.048553 | 138.325719 | 107.037159 | 254267.0 | FL | 27.664827 | -81.515754 |
| 8 | Georgia | 1140523.0 | 3722.0 | 4222231.0 | 798309.0 | 67653.0 | 9168.0 | 48669467.0 | 1.367817e+07 | 39.296094 | 27.655914 | 46.506182 | 35.300878 | 26.096351 | 61628.0 | GA | 32.157435 | -82.907123 |
| 9 | Hawaii | 185285.0 | 543.0 | 169549.0 | 17815.0 | 548.0 | 184.0 | 828966.0 | 1.315830e+05 | 8.375436 | 4.554025 | 8.217690 | 7.824536 | 4.465801 | 64145.0 | HI | 19.898682 | -155.665857 |
| 10 | Idaho | 82401.0 | 256.0 | 19921.0 | 4528.0 | 335.0 | 179.0 | 250032.0 | 5.111589e+04 | 5.386463 | 4.675555 | 7.256379 | 4.885384 | 3.352492 | 52147.0 | ID | 44.068202 | -114.742041 |
| 11 | Illinois | 1916615.0 | 6212.0 | 4270081.0 | 917197.0 | 128270.0 | 12608.0 | 60090124.0 | 2.148794e+07 | 49.347658 | 39.415959 | 61.362082 | 47.399445 | 30.396643 | 80656.0 | IL | 40.633125 | -89.398528 |
| 12 | Indiana | 171057.0 | 476.0 | 50398.0 | 10450.0 | 864.0 | 158.0 | 595416.0 | 1.345338e+05 | 14.343574 | 8.845460 | 20.058901 | 12.186182 | 9.300214 | 50470.0 | IN | 40.551217 | -85.602364 |
| 13 | Iowa | 72438.0 | 210.0 | 5995.0 | 1524.0 | 218.0 | 48.0 | 90311.0 | -2.201422e+04 | 18.452184 | 10.114847 | 20.161632 | 17.020332 | 11.552583 | 22426.0 | IA | 41.878003 | -93.097702 |
| 14 | Kansas | 42223.0 | 133.0 | 3564.0 | 936.0 | 101.0 | 34.0 | 55877.0 | -1.354854e+04 | 9.294125 | 5.796324 | 11.280299 | 9.687760 | 4.232890 | 36400.0 | KS | 39.011902 | -98.484246 |
| 15 | Kentucky | 359495.0 | 1182.0 | 557606.0 | 92792.0 | 15359.0 | 934.0 | 4901023.0 | 2.736379e+05 | 23.364392 | 14.573633 | 28.157904 | 20.242065 | 13.015305 | 38868.0 | KY | 37.839333 | -84.270018 |
| 16 | Louisiana | 398159.0 | 1301.0 | 157499.0 | 32529.0 | 1925.0 | 619.0 | 1610686.0 | 4.687433e+05 | 36.319641 | 24.366391 | 48.680205 | 32.088021 | 26.408298 | 88518.0 | LA | 31.244823 | -92.145024 |
| 17 | Maine | 22106.0 | 74.0 | 1524.0 | 401.0 | 76.0 | 12.0 | 23660.0 | -1.033076e+04 | 6.196723 | 3.632133 | 6.160224 | 5.922273 | 4.075046 | 14321.0 | ME | 45.253783 | -69.445469 |
| 18 | Maryland | 541186.0 | 1777.0 | 1042827.0 | 172060.0 | 15324.0 | 2440.0 | 9078749.0 | 4.169426e+06 | 16.096310 | 12.156596 | 19.922873 | 13.916502 | 13.529028 | 10821.0 | MD | 39.045755 | -76.641271 |
| 19 | Massachusetts | 714082.0 | 2262.0 | 659739.0 | 155397.0 | 17145.0 | 982.0 | 10212525.0 | 3.614586e+06 | 24.113056 | 18.921477 | 29.306564 | 21.582233 | 19.040323 | 34808.0 | MA | 42.407211 | -71.382437 |
| 20 | Michigan | 705975.0 | 2192.0 | 1152559.0 | 202037.0 | 21296.0 | 1919.0 | 12222041.0 | 3.905214e+06 | 36.995008 | 22.506305 | 57.580109 | 36.812484 | 25.916875 | 101782.0 | MI | 44.314844 | -85.602364 |
| 21 | Minnesota | 647794.0 | 2040.0 | 1117503.0 | 190427.0 | 13896.0 | 2421.0 | 11389860.0 | 3.090917e+06 | 23.400347 | 15.759311 | 31.114372 | 26.790697 | 15.604405 | 39457.0 | MN | 46.729553 | -94.685900 |
| 22 | Mississippi | 98459.0 | 338.0 | 8400.0 | 1987.0 | 202.0 | 65.0 | 111258.0 | -3.775294e+04 | 20.353293 | 12.697574 | 23.781204 | 18.588193 | 12.094801 | 49669.0 | MS | 32.354668 | -89.398528 |
| 23 | Missouri | 886679.0 | 2821.0 | 712665.0 | 134684.0 | 8132.0 | 1824.0 | 6808912.0 | 2.833484e+06 | 37.791031 | 22.382046 | 47.985046 | 34.088225 | 27.248339 | 42997.0 | MO | 37.964253 | -91.831833 |
| 24 | Montana | 28144.0 | 110.0 | 3288.0 | 799.0 | 43.0 | 30.0 | 42324.0 | -8.891696e+03 | 4.266636 | 3.214638 | 5.341405 | 3.846828 | 3.104403 | 58606.0 | MT | 46.879682 | -110.362566 |
| 25 | Nebraska | 102558.0 | 303.0 | 16729.0 | 4258.0 | 348.0 | 89.0 | 235498.0 | 1.698611e+04 | 12.442129 | 8.017317 | 16.624440 | 11.742984 | 8.694517 | 26900.0 | NE | 41.492537 | -99.901813 |
| 26 | Nevada | 1027822.0 | 3305.0 | 1609266.0 | 299088.0 | 16018.0 | 2436.0 | 14520866.0 | 6.599594e+06 | 28.570199 | 19.140359 | 40.344102 | 26.044822 | 19.026226 | 27459.0 | NV | 38.802610 | -116.419389 |
| 27 | New Hampshire | 78317.0 | 287.0 | 23622.0 | 4826.0 | 309.0 | 70.0 | 256190.0 | 7.956685e+04 | 5.587196 | 3.796896 | 7.291724 | 4.670433 | 3.209757 | 13296.0 | NH | 43.193852 | -71.572395 |
| 28 | New Jersey | 745784.0 | 2421.0 | 1070273.0 | 321158.0 | 31446.0 | 4354.0 | 22563502.0 | 4.660431e+06 | 26.591985 | 22.261499 | 31.234871 | 23.911285 | 17.362485 | 21776.0 | NJ | 40.058324 | -74.405661 |
| 29 | New Mexico | 229153.0 | 757.0 | 141198.0 | 26830.0 | 804.0 | 353.0 | 1219452.0 | 5.251915e+05 | 11.108694 | 7.143790 | 14.048833 | 10.012401 | 7.867997 | 39402.0 | NM | 34.972730 | -105.032363 |
| 30 | New York | 2079743.0 | 6474.0 | 1580809.0 | 394541.0 | 44946.0 | 6771.0 | 26001815.0 | 6.655109e+06 | 105.406317 | 81.713966 | 127.434728 | 96.988827 | 82.078696 | 148285.0 | NY | 43.299428 | -74.217933 |
| 31 | North Carolina | 971823.0 | 3109.0 | 1626602.0 | 261474.0 | 30926.0 | 3122.0 | 14421908.0 | 5.131910e+06 | 53.488548 | 34.065315 | 57.986388 | 45.241784 | 35.655505 | 97351.0 | NC | 35.759573 | -79.019300 |
| 32 | North Dakota | 20135.0 | 60.0 | 1530.0 | 320.0 | 53.0 | 22.0 | 19386.0 | -9.414783e+03 | 4.536785 | 2.664696 | 9.065061 | 3.955043 | 2.331429 | 60114.0 | ND | 47.551493 | -101.002012 |
| 33 | Ohio | 532446.0 | 1650.0 | 388744.0 | 75811.0 | 5777.0 | 860.0 | 4294131.0 | 1.730893e+06 | 30.180474 | 18.774995 | 34.787170 | 27.473984 | 22.558903 | 59544.0 | OH | 40.417287 | -82.907123 |
| 34 | Oklahoma | 229080.0 | 747.0 | 43393.0 | 10266.0 | 709.0 | 251.0 | 518441.0 | 7.108248e+04 | 23.540018 | 14.931700 | 31.644275 | 19.615421 | 14.949808 | 42112.0 | OK | 35.007752 | -97.092877 |
| 35 | Oregon | 354008.0 | 1162.0 | 167158.0 | 30683.0 | 2443.0 | 375.0 | 1531785.0 | 5.241928e+05 | 16.363689 | 10.109026 | 23.690029 | 15.089873 | 10.891813 | 67377.0 | OR | 43.804133 | -120.554201 |
| 36 | Pennsylvania | 764536.0 | 2397.0 | 763387.0 | 158507.0 | 15639.0 | 2103.0 | 9448856.0 | 2.468642e+06 | 39.588168 | 30.113093 | 49.433688 | 33.650904 | 30.398281 | 61689.0 | PA | 41.203322 | -77.194525 |
| 37 | Puerto Rico | 287616.0 | 913.0 | 115146.0 | 28097.0 | 1383.0 | 278.0 | 1687954.0 | 2.934551e+05 | 20.475652 | 11.699072 | 29.051136 | 17.406754 | 14.684168 | 39829.0 | PR | 18.220833 | -66.590149 |
| 38 | Rhode Island | 114566.0 | 343.0 | 40535.0 | 8304.0 | 572.0 | 118.0 | 459189.0 | 1.424922e+05 | 8.336403 | 5.583992 | 10.449085 | 6.342208 | 4.663637 | 14307.0 | RI | 41.580095 | -71.477429 |
| 39 | South Carolina | 156770.0 | 526.0 | 19463.0 | 4651.0 | 543.0 | 100.0 | 287043.0 | -2.569031e+04 | 25.000947 | 16.752272 | 38.330362 | 22.458661 | 15.932646 | 71258.0 | SC | 33.836081 | -81.163725 |
| 40 | South Dakota | 14097.0 | 38.0 | 1508.0 | 282.0 | 48.0 | 10.0 | 19082.0 | -5.740848e+03 | 4.773531 | 2.305849 | 2.963445 | 3.368048 | 2.714362 | 26232.0 | SD | 43.969515 | -99.901813 |
| 41 | Tennessee | 767830.0 | 2473.0 | 654056.0 | 114742.0 | 7843.0 | 1370.0 | 5928128.0 | 2.052750e+06 | 37.010752 | 22.880406 | 44.529004 | 33.048228 | 24.555663 | 50329.0 | TN | 35.517491 | -86.580447 |
| 42 | Texas | 3610182.0 | 11508.0 | 7529551.0 | 1360285.0 | 140712.0 | 30186.0 | 79203635.0 | 3.161296e+07 | 130.197929 | 87.968360 | 172.297749 | 114.104142 | 84.175203 | 268617.0 | TX | 31.968599 | -99.901813 |
| 44 | Utah | 577063.0 | 1833.0 | 988687.0 | 139308.0 | 11083.0 | 1464.0 | 7390630.0 | 2.680912e+06 | 10.998714 | 6.397857 | 17.703200 | 10.809367 | 7.777507 | 14869.0 | UT | 39.320980 | -111.093731 |
| 45 | Vermont | 18084.0 | 67.0 | 1302.0 | 306.0 | 42.0 | 10.0 | 18658.0 | -7.302805e+03 | 4.727564 | 2.934127 | 5.205761 | 3.668832 | 3.136130 | 10785.0 | VT | 44.558803 | -72.577841 |
| 46 | Virginia | 1166377.0 | 3736.0 | 836086.0 | 163144.0 | 24974.0 | 2363.0 | 10025312.0 | 3.544582e+06 | 65.570230 | 41.625959 | 82.441222 | 56.156926 | 44.372712 | 104798.0 | VA | 37.431573 | -78.656894 |
| 47 | Washington | 828721.0 | 2733.0 | 1010843.0 | 182730.0 | 10002.0 | 1746.0 | 9159378.0 | 3.343476e+06 | 29.348457 | 18.586619 | 45.817620 | 27.110200 | 20.479240 | 51549.0 | WA | 47.751074 | -120.740139 |
| 48 | West Virginia | 28172.0 | 89.0 | 2910.0 | 667.0 | 108.0 | 28.0 | 41359.0 | -5.266696e+03 | 4.859206 | 3.248971 | 4.315428 | 4.130487 | 4.943429 | 11146.0 | WV | 38.597626 | -80.454903 |
| 49 | Wisconsin | 211199.0 | 583.0 | 63855.0 | 14228.0 | 1329.0 | 263.0 | 812700.0 | 2.151153e+05 | 17.578866 | 11.461796 | 21.843796 | 16.664608 | 13.981228 | 60415.0 | WI | 43.784440 | -88.787868 |
| 50 | Wyoming | 10058.0 | 16.0 | 821.0 | 197.0 | 31.0 | 8.0 | 11651.0 | -4.358891e+03 | 2.169019 | 1.421155 | 2.458842 | 3.983456 | 1.336606 | 38097.0 | WY | 43.075968 | -107.290284 |
#get the most state makeing weather_delay
most_state_delay=most_state_delay.sort_values(by='weather_delay(min/flight)',ascending=False )
most_state_delay.plot(kind="bar",x='STATE_NAME',y='weather_delay(min/flight)',figsize=(20,10)),sorted
print('The Mean of Weather Delay(min/flight) by US State ')
print(most_state_delay['weather_delay(min/flight)'].mean())
print('Florida,California,Texas is the Most US State affected by Weather Delay')
plt.show()
The Mean of Weather Delay(min/flight) by US State 39.75564591352922 Florida,California,Texas is the Most US State affected by Weather Delay
#us state and Weather Delay info #using Google API Maps #satellite view
#print('Map view of Most US State affected by Weather Delay')
#print('Florida,California,Texas is the Most US State affected by Weather Delay')
locations = most_state_delay[['latitude', 'longitude']]
weights = most_state_delay['weather_delay(min/flight)']
fig = gmaps.figure()
fig.add_layer(gmaps.heatmap_layer(locations, weights=weights))
fig
ax = geo.plot(kind="scatter", x="LONGITUDE", y="LATITUDE", figsize=(10,7),colorbar=False, alpha=0.4,)
plt.ylabel("LATITUDE", fontsize=14)
plt.xlabel("LONGITUDE", fontsize=14)
state = geo["AIRPORT_STATE_NAME"]
plt.legend(["AIRPORT_STATE_NAME"])
plt.show()
!jupyter nbconvert airports.ipynb --to html
!jupyter nbconvert airports.ipynb --to slides --post serve --no-input --no-prompt
#!jupyter nbconvert airports.html --to pdf --template classic
#!jupyter nbconvert airports.ipynb --to slides --reveal-prefix reveal.js --post serve